Main Track – IJCAI 2023

Accepted Papers List2

StockFormer: Learning Hybrid Trading Machines with Predictive Coding

Siyu Gao, Yunbo Wang, Xiaokang Yang

[+] More

[-] Less

Typical RL-for-finance solutions directly optimize trading policies over the noisy market data, such as stock prices and trading volumes, without explicitly considering the future trends and correlations of different investment assets as we humans do. In this paper, we present StockFormer, a hybrid trading machine that integrates the forward modeling capabilities of predictive coding with the advantages of RL agents in policy flexibility. The predictive coding part consists of three Transformer branches with modified structures, which respectively extract effective latent states of long-/short-term future dynamics and asset relations. The RL agent adaptively fuses these states and then executes an actor-critic algorithm in the unified state space. The entire model is jointly trained by propagating the critic’s gradients back to the predictive coding module. StockFormer significantly outperforms existing approaches across three publicly available financial datasets in terms of portfolio returns and Sharpe ratios.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Finance
Machine Learning -> ML: Deep reinforcement learning

PathLAD+: An Improved Exact Algorithm for Subgraph Isomorphism Problem

Yiyuan Wang, Chenghou Jin, Shaowei Cai, Qingwei Lin

[+] More

[-] Less

The subgraph isomorphism problem (SIP) is a challenging problem with wide practical applications. In the last decade, despite being a theoretical hard problem, researchers design various algorithms for solving SIP. In this work, we propose three main heuristics and develop an improved exact algorithm for SIP. First, we design a probing search procedure to try whether the search procedure can successfully obtain a solution at first sight. Second, we design a novel matching ordering as a value-ordering heuristic, which uses some useful information obtained from the probing search procedure to preferentially select some promising target vertices. Third, we discuss the characteristics of different propagation methods in the context of SIP and present an adaptive propagation method to make a good balance between these methods. Experimental results on a broad range of real-world benchmarks show that our proposed algorithm performs better than state-of-the-art algorithms for the SIP.

List of keywords

Search -> S: Combinatorial search and optimisation
Search -> S: Heuristic search

Non-Obvious Manipulability in Extensive-Form Mechanisms: the Revelation Principle for Single-Parameter Agents

Thomas Archbold, Bart de Keijzer, Carmine Ventre

[+] More

[-] Less

Recent work in algorithmic mechanism design focuses on designing mechanisms for agents with \emph{bounded rationality}, modifying the constraints that must be satisfied in order to achieve incentive compatibility. Starting with Li’s strengthening of strategyproofness, \emph{obvious strategyproofness (OSP)} requires truthtelling to be “obvious” over dishonesty, roughly meaning that the worst outcome from truthful actions must be no worse than the best outcome for dishonest ones. A celebrated result for dominant-strategy incentive-compatible mechanisms that allows us to restrict attention to direct mechanisms, known as the \emph{revelation principle}, does not hold for OSP: the implementation details matter for the obvious incentive properties of the mechanism. Studying agent strategies in real-life mechanisms, Troyan and Morrill introduce a relaxation of strategyproofness known as non-obvious manipulability, which only requires comparing certain extrema of the agents’ utility functions in order for a mechanism to be incentive-compatible. Specifically a mechanism is \emph{not obviously manipulable (NOM)} if the best and worst outcomes when acting truthfully are no worse than the best and worst outcomes when acting dishonestly. In this work we first extend the cycle monotonicity framework for direct-revelation NOM mechanism design to indirect mechanisms. We then apply this to two settings, single-parameter agents and mechanisms for two agents in which one has a two-value domain, and show that under these models the revelation principle holds: direct mechanisms are just as powerful as indirect ones.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Mechanism design
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Multidisciplinary Topics and Applications -> MDA: Economics

HireVAE: an Online and Adaptive Factor Model based on Hierarchical and Regime-Switch VAE

Zikai Wei, Anyi Rao, Bo Dai, Dahua Lin

[+] More

[-] Less

Factor model is a fundamental investment tool in quantitative investment, which can be empowered by deep learning to become more flexible and efficient in practical complicated investing situations. However, it is still an open question to build a factor model that can conduct stock prediction in an online and adaptive setting, where the model can adapt itself to match current market regime identified based on only point-in-time market information. To tackle this problem, we propose the first deep learning based online and adaptive factor model, HireVAE, at the core of which is a hierarchical latent space that embeds the underlying relationship between global market situation and stock-wise latent factors, so that HireVAE can effectively estimate useful latent factors given only historical market information and subsequently predict accurate stock returns. Across four commonly used real stock market benchmarks, the proposed HireVAE demonstrate superior performance in terms of active returns over previous methods, verifying the potential of such online and adaptive factor model.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Finance
Machine Learning -> ML: Applications

Teaching What You Should Teach: A Data-Based Distillation Method

Shitong Shao, Huanran Chen, Zhen Huang, Linrui Gong, Shuai Wang, Xinxiao Wu

[+] More

[-] Less

In real teaching scenarios, an excellent teacher always teaches what he (or she) is good at but the student is not. This gives the student the best assistance in making up for his (or her) weaknesses and becoming a good one overall. Enlightened by this, we introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework, and propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally. To be specific, we design a neural network-based data augmentation module with priori bias to find out what meets the teacher’s strengths but the student’s weaknesses, by learning magnitudes and probabilities to generate suitable data samples. By training the data augmentation module and the generalized distillation paradigm alternately, a student model is learned with excellent generalization ability. To verify the effectiveness of our method, we conducted extensive comparative experiments on object recognition, detection, and segmentation tasks. The results on the CIFAR-100, ImageNet-1k, MS-COCO, and Cityscapes datasets demonstrate that our method achieves state-of-the-art performance on almost all teacher-student pairs. Furthermore, we conduct visualization studies to explore what magnitudes and probabilities are needed for the distillation process.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Segmentation

Adversarial Amendment is the Only Force Capable of Transforming an Enemy into a Friend

Chong Yu, Tao Chen, Zhongxue Gan

[+] More

[-] Less

Adversarial attack is commonly regarded as a huge threat to neural networks because of misleading behavior. This paper presents an opposite perspective: adversarial attacks can be harnessed to improve neural models if amended correctly. Unlike traditional adversarial defense or adversarial training schemes that aim to improve the adversarial robustness, the proposed adversarial amendment (AdvAmd) method aims to improve the original accuracy level of neural models on benign samples. We thoroughly analyze the distribution mismatch between the benign and adversarial samples. This distribution mismatch and the mutual learning mechanism with the same learning ratio applied in prior art defense strategies is the main cause leading the accuracy degradation for benign samples. The proposed AdvAmd is demonstrated to steadily heal the accuracy degradation and even leads to a certain accuracy boost of common neural models on benign classification, object detection, and segmentation tasks. The efficacy of the AdvAmd is contributed by three key components: mediate samples (to reduce the influence of distribution mismatch with a fine-grained amendment), auxiliary batch norm (to solve the mutual learning mechanism and the smoother judgment surface), and AdvAmd loss (to adjust the learning ratios according to different attack vulnerabilities) through quantitative and ablation experiments.

List of keywords

Machine Learning -> ML: Robustness
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Constraint Satisfaction and Optimization -> CSO: Applications

TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition

Tianlun Zheng, Zhineng Chen, Jinfeng Bai, Hongtao Xie, Yu-Gang Jiang

[+] More

[-] Less

Text irregularities pose significant challenges to scene text recognizers. Thin-Plate Spline (TPS)-based rectification is widely regarded as an effective means to deal with them. Currently, the calculation of TPS transformation parameters purely depends on the quality of regressed text borders. It ignores the text content and often leads to unsatisfactory rectified results for severely distorted text. In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time. TPS++ formulates the parameter calculation as a joint process of foreground control point regression and content-based attention score estimation, which is computed by a dedicated designed gated-attention block. TPS++ builds a more flexible content-aware rectifier, generating a natural text correction that is easier to read by the subsequent recognizer. Moreover, TPS++ shares the feature backbone with the recognizer in part and implements the rectification at feature-level rather than image-level, incurring only a small overhead in terms of parameters and inference time. Experiments on public benchmarks show that TPS++ consistently improves the recognition and achieves state-of-the-art accuracy. Meanwhile, it generalizes well on different backbones and recognizers. Code is at https://github.com/simplify23/TPS_PP.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Scene analysis and understanding

Self-supervised Graph Disentangled Networks for Review-based Recommendation

Yuyang Ren, Haonan Zhang, Qi Li, Luoyi Fu, Xinbing Wang, Chenghu Zhou

[+] More

[-] Less

User review data is considered as auxiliary information to alleviate the data sparsity problem and improve the quality of learned user/item or interaction representations in review-based recommender systems. However, existing methods usually model user-item interactions in a holistic manner and neglect the entanglement of the latent intents behind them, e.g., price, quality, or appearance, resulting in suboptimal representations and reducing interpretability. In this paper, we propose a Self-supervised Graph Disentangled Networks for review-based recommendation (SGDN), to separately model the user-item interactions based on the latent factors through the textual review data. To this end, we first model the distributions of interactions over latent factors from both semantic information in review data and structural information in user-item graph data, forming several factor graphs. Then a factorized message passing mechanism is designed to learn disentangled user/item and interaction representations on the factor graphs. Finally, we set an intent-aware contrastive learning task to alleviate the sparsity issue and encourage disentanglement through dynamically identifying positive and negative samples based on the learned intent distributions. Empirical results over five benchmark datasets validate the superiority of SGDN over the state-of-the-art methods and the interpretability of learned intent factors.

List of keywords

Data Mining -> DM: Recommender systems
Data Mining -> DM: Collaborative filtering

100

Understanding the Generalization Ability of Deep Learning Algorithms: A Kernelized Rényi’s Entropy Perspective

Yuxin Dong, Tieliang Gong, Hong Chen, Chen Li

[+] More

[-] Less

Recently, information-theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient / Langevin descent (SGD/SGLD) learning algorithms without strong assumptions such as Lipschitz or convexity conditions. However, the current generalization error bounds within this framework are still far from optimal, while substantial improvements on these bounds are quite challenging due to the intractability of high-dimensional information quantities. To address this issue, we first propose a novel information theoretical measure: kernelized Rényi’s entropy, by utilizing operator representation in Hilbert space. It inherits the properties of Shannon’s entropy and can be effectively calculated via simple random sampling, while remaining independent of the input dimension. We then establish the generalization error bounds for SGD/SGLD under kernelized Rényi’s entropy, where the mutual information quantities can be directly calculated, enabling evaluation of the tightness of each intermediate step. We show that our information-theoretical bounds depend on the statistics of the stochastic gradients evaluated along with the iterates, and are rigorously tighter than the current state-of-the-art (SOTA) results. The theoretical findings are also supported by large-scale empirical studies.

List of keywords

Machine Learning -> ML: Theory of deep learning
Machine Learning -> ML: Learning theory

108

A regular matching constraint for string variables

Roberto Amadini, Peter Stuckey

[+] More

[-] Less

Using a regular language as a pattern for string matching is nowadays a common—and sometimes unsafe—operation, provided as a built-in feature by most programming languages. A proper constraint solver over string variables should thus support most of the operations over regular expressions and related constructs. However, state-of-the-art string solvers natively support only the membership relation of a string variable to a regular language. Here we take a step forward by defining a specialised propagator for the match operation, returning the leftmost position where a pattern can match a given string. Empirical evidences show the effectiveness of our approach, implemented within the constraint programming framework, and tested against state-of-the-art string solvers.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint programming
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction

109

Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning

Sen Zhao, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen

[+] More

[-] Less

Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation. In each turn of CRS, there naturally have two decision-making processes with different roles that influence each other: 1) director, which is to select the follow-up option (i.e., ask or recommend) that is more effective for reducing the action space and acquiring user preferences; and 2) actor, which is to accordingly choose primitive actions (i.e., asked attribute or recommended item) to estimate the effectiveness of the director’s option. However, existing methods heavily rely on a unified decision-making module or heuristic rules, while neglecting to distinguish the roles of different decision procedures, as well as the mutual influences between them. To address this, we propose a novel Director-Actor Hierarchical Conversational Recommender (DAHCR), where the director selects the most effective option, followed by the actor accordingly choosing primitive actions that satisfy user preferences. Specifically, we develop a dynamic hypergraph to model user preferences and introduce an intrinsic motivation to train from weak supervision over the director. Finally, to alleviate the bad effect of model bias on the mutual influence between the director and actor, we model the director’s option by sampling from a categorical distribution. Extensive experiments demonstrate that DAHCR outperforms state-of-the-art methods.

List of keywords

Data Mining -> DM: Recommender systems

133

Timestamp-Supervised Action Segmentation from the Perspective of Clustering

Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Fuchun Sun

[+] More

[-] Less

Video action segmentation under timestamp supervision has recently received much attention due to lower annotation costs. Most existing methods generate pseudo-labels for all frames in each video to train the segmentation model. However, these methods suffer from incorrect pseudo-labels, especially for the semantically unclear frames in the transition region between two consecutive actions, which we call ambiguous intervals. To address this issue, we propose a novel framework from the perspective of clustering, which includes the following two parts. First, pseudo-label ensembling generates incomplete but high-quality pseudo-label sequences, where the frames in ambiguous intervals have no pseudo-labels. Second, iterative clustering iteratively propagates the pseudo-labels to the ambiguous intervals by clustering, and thus updates the pseudo-label sequences to train the model. We further introduce a clustering loss, which encourages the features of frames within the same action segment more compact. Extensive experiments show the effectiveness of our method.

List of keywords

Computer Vision -> CV: Video analysis and understanding
Computer Vision -> CV: Applications
Computer Vision -> CV: Machine learning for vision

162

Decoupling with Entropy-based Equalization for Semi-Supervised Semantic Segmentation

Chuanghao Ding, Jianrong Zhang, Henghui Ding, Hongwei Zhao, Zhihui Wang, Tengfei Xing, Runbo Hu

[+] More

[-] Less

Semi-supervised semantic segmentation methods are the main solution to alleviate the problem of high annotation consumption in semantic segmentation. However, the class imbalance problem makes the model favor the head classes with sufficient training samples, resulting in poor performance of the tail classes. To address this issue, we propose a Decoupled Semi-Supervise Semantic Segmentation (DeS4) framework based on the teacher-student model. Specifically, we first propose a decoupling training strategy to split the training of the encoder and segmentation decoder, aiming at a balanced decoder. Then, a non-learnable prototype-based segmentation head is proposed to regularize the category representation distribution consistency and perform a better connection between the teacher model and the student model. Furthermore, a Multi-Entropy Sampling (MES) strategy is proposed to collect pixel representation for updating the shared prototype to get a class-unbiased head. We conduct extensive experiments of the proposed DeS4 on two challenging benchmarks (PASCAL VOC 2012 and Cityscapes) and achieve remarkable improvements over the previous state-of-the-art methods.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Segmentation

169

MM-PCQA: Multi-Modal Learning for No-reference Point Cloud Quality Assessment

Zicheng Zhang, Wei Sun, Xiongkuo Min, Qiyuan Wang, Jun He, Quan Zhou, Guangtao Zhai

[+] More

[-] Less

The visual quality of point clouds has been greatly emphasized since the ever-increasing 3D vision applications are expected to provide cost-effective and high-quality experiences for users. Looking back on the development of point cloud quality assessment (PCQA), the visual quality is usually evaluated by utilizing single-modal information, i.e., either extracted from the 2D projections or 3D point cloud. The 2D projections contain rich texture and semantic information but are highly dependent on viewpoints, while the 3D point clouds are more sensitive to geometry distortions and invariant to viewpoints. Therefore, to leverage the advantages of both point cloud and projected image modalities, we propose a novel no-reference Multi-Modal Point Cloud Quality Assessment (MM-PCQA) metric. In specific, we split the point clouds into sub-models to represent local geometry distortions such as point shift and down-sampling. Then we render the point clouds into 2D image projections for texture feature extraction. To achieve the goals, the sub-models and projected images are encoded with point-based and image-based neural networks. Finally, symmetric cross-modal attention is employed to fuse multi-modal quality-aware information. Experimental results show that our approach outperforms all compared state-of-the-art methods and is far ahead of previous no-reference PCQA methods, which highlights the effectiveness of the proposed method. The code is available at https://github.com/zzc-1998/MM-PCQA.

List of keywords

Computer Vision -> CV: 3D computer vision
Machine Learning -> ML: Multi-modal learning

174

Scalable Communication for Multi-Agent Reinforcement Learning via Transformer-Based Email Mechanism

Xudong Guo, Daming Shi, Wenhui Fan

[+] More

[-] Less

Communication can impressively improve cooperation in multi-agent reinforcement learning (MARL), especially for partially-observed tasks. However, existing works either broadcast the messages leading to information redundancy, or learn targeted communication by modeling all the other agents as targets, which is not scalable when the number of agents varies. In this work, to tackle the scalability problem of MARL communication for partially-observed tasks, we propose a novel framework Transformer-based Email Mechanism (TEM). The agents adopt local communication to send messages only to the ones that can be observed without modeling all the agents. Inspired by human cooperation with email forwarding, we design message chains to forward information to cooperate with the agents outside the observation range. We introduce Transformer to encode and decode the message chain to choose the next receiver selectively. Empirically, TEM outperforms the baselines on multiple cooperative MARL benchmarks. When the number of agents varies, TEM maintains superior performance without further training.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent communication
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Machine Learning -> ML: Deep reinforcement learning
Robotics -> ROB: Multi-robot systems

193

SF-PATE: Scalable, Fair, and Private Aggregation of Teacher Ensembles

Cuong Tran, Keyu Zhu, Ferdinando Fioretto, Pascal Van Hentenryck

[+] More

[-] Less

A critical concern in data-driven processes is to build models whose outcomes do not discriminate against some demographic groups, including gender, ethnicity, or age. In learning tasks, knowledge of the group attributes is essential to ensure non6 discrimination, but in practice, these attributes may not be available due to legal and ethical requirements. To address this challenge, this paper studies a model that protects the privacy of the individuals’ sensitive information while also allowing it to learn non-discriminatory predictors. A key feature of the proposed model is to enable the use of off-the-shelves and non-private fair models to create a privacy-preserving and fair model. The paper analyzes the relation between accuracy, privacy, and fairness, and assess the benefits of the proposed models on several prediction tasks. In particular, this proposal allows both scalable and accurate training of private and fair models for very large neural networks.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Data Mining -> DM: Privacy-preserving data mining
Machine Learning -> ML: Multi-task and transfer learning

194

MILD: Modeling the Instance Learning Dynamics for Learning with Noisy Labels

ChuanYang Hu, Shipeng Yan, Zhitong Gao, Xuming He

[+] More

[-] Less

Despite deep learning has achieved great success, it often relies on a large amount of training data with accurate labels, which are expensive and time-consuming to collect. A prominent direction to reduce the cost is to learn with noisy labels, which are ubiquitous in the real-world applications. A critical challenge for noisy image classification is to reduce the effect of network memorization on the false6 labeled data. In this work, we propose an iterative selection approach capable of identifying clean data by considering the overall learning dynamics of each data instance. Different from the previous small-loss heuristics, we leverage the observation that deep network is easy to memorize and hard to forget clean data. In particular, we measure the difficulty of memorization and forgetting for each instance via the transition times from misclassified to memorized and from memorized to misclassified in training, respectively. Then, we integrate them as the criterion for selection. Based on the proposed new criterion, we retain a subset of identified clean data and repeat the selection procedure to iteratively refine the clean subset. To validate our method, we perform exhaustive experiments on synthetic noisy datasets and real-world web data, and our strategy outperforms existing noisy-label learning methods.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Machine learning for vision

199

Approximate Envy-Freeness in Graphical Cake Cutting

Sheung Man Yuen, Warut Suksompong

[+] More

[-] Less

We study the problem of fairly allocating a divisible resource in the form of a graph, also known as graphical cake cutting. Unlike for the canonical interval cake, a connected envy-free allocation is not guaranteed to exist for a graphical cake. We focus on the existence and computation of connected allocations with low envy. For general graphs, we show that there is always a 1/2-additive-envy-free allocation and, if the agents’ valuations are identical, a (2+\epsilon)-multiplicative-envy-free allocation for any \epsilon > 0. In the case of star graphs, we obtain a multiplicative factor of 3+\epsilon for arbitrary valuations and 2 for identical valuations. We also derive guarantees when each agent can receive more than one connected piece. All of our results come with efficient algorithms for computing the respective allocations.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Game Theory and Economic Paradigms -> GTEP: Computational social choice

200

LISSNAS: Locality-based Iterative Search Space Shrinkage for Neural Architecture Search

Bhavna Gopal, Arjun Sridhar, Tunhou Zhang, Yiran Chen

[+] More

[-] Less

Search spaces hallmark the advancement of Neural Architecture Search (NAS). Large and complex search spaces with versatile building operators and structures provide more opportunities to brew promising architectures, yet pose severe challenges on efficient exploration and exploitation. Subsequently, several search space shrinkage methods optimize by selecting a single sub-region that contains some well-performing networks. Small performance and efficiency gains are observed with these methods but such techniques leave room for significantly improved search performance and are ineffective at retaining architectural diversity. We propose LISSNAS, an automated algorithm that shrinks a large space into a diverse, small search space with SOTA search performance. Our approach leverages locality, the relationship between structural and performance similarity, to efficiently extract many pockets of well-performing networks. We showcase our method on an array of search spaces spanning various sizes and datasets. We accentuate the effectiveness of our shrunk spaces when used in one-shot search by achieving the best Top-1 accuracy in two different search spaces. Our method achieves a SOTA Top-1 accuracy of 77.6% in ImageNet under mobile constraints, best-in-class Kendal-Tau, architectural diversity, and search space size.

List of keywords

Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Automated machine learning

203

FedSampling: A Better Sampling Strategy for Federated Learning

Tao Qi, Fangzhao Wu, Lingjuan Lyu, Yongfeng Huang, Xing Xie

[+] More

[-] Less

Federated learning (FL) is an important technique for learning models from decentralized data in a privacy-preserving way. Existing FL methods usually uniformly sample clients for local model learning in each round. However, different clients may have significantly different data sizes, and the clients with more data cannot have more opportunities to contribute to model training, which may lead to inferior performance. In this paper, instead of client uniform sampling, we propose a novel data uniform sampling strategy for federated learning (FedSampling), which can effectively improve the performance of federated learning especially when client data size distribution is highly imbalanced across clients. In each federated learning round, local data on each client is randomly sampled for local model learning according to a probability based on the server desired sample size and the total sample size on all available clients. Since the data size on each client is privacy-sensitive, we propose a privacy-preserving way to estimate the total sample size with a differential privacy guarantee. Experiments on four benchmark datasets show that FedSampling can effectively improve the performance of federated learning.

List of keywords

Machine Learning -> ML: Federated learning
Data Mining -> DM: Privacy-preserving data mining

204

A Rigorous Risk-aware Linear Approach to Extended Markov Ratio Decision Processes with Embedded Learning

Alexander Zadorojniy, Takayuki Osogami, Orit Davidovich

[+] More

[-] Less

We consider the problem of risk-aware Markov Decision Processes (MDPs) for Safe AI. We introduce a theoretical framework, Extended Markov Ratio Decision Processes (EMRDP), that incorporates risk into MDPs and embeds environment learning into this framework. We propose an algorithm to find the optimal policy for EMRDP with theoretical guarantees. Under a certain monotonicity assumption, this algorithm runs in strongly-polynomial time both in the discounted and expected average reward models. We validate our algorithm empirically on a Grid World benchmark, evaluating its solution quality, required number of steps, and numerical stability. We find its solution quality to be stable under data noising, while its required number of steps grows with added noise. We observe its numerical stability compared to global methods.

List of keywords

Planning and Scheduling -> PS: Markov decisions processes
Machine Learning -> ML: Reinforcement learning
Uncertainty in AI -> UAI: Sequential decision making

222

Shhh! The Logic of Clandestine Operations

Pavel Naumov, Oliver Orejola

[+] More

[-] Less

An operation is called covert if it conceals the identity of the actor; it is called clandestine if the very fact that the operation is conducted is concealed. The paper proposes a formal semantics of clandestine operations and introduces a sound and complete logical system that describes the interplay between the distributed knowledge modality and a modality capturing coalition power to conduct clandestine operations.

List of keywords

Knowledge Representation and Reasoning -> KRR: Reasoning about knowledge and belief
Knowledge Representation and Reasoning -> KRR: Reasoning about actions

223

Appearance Prompt Vision Transformer for Connectome Reconstruction

Rui Sun, Naisong Luo, Yuwen Pan, Huayu Mai, Tianzhu Zhang, Zhiwei Xiong, Feng Wu

[+] More

[-] Less

Neural connectivity reconstruction aims to understand the function of biological reconstruction, and promotes basic scientific research. The intricate morphology and densely intertwined branches makes it an extremely challenging task. Most previous best-performing methods adopt affinity learning or metric learning. Nevertheless, they either neglect to model explicit voxel semantics caused by implicit optimization or are hysteresis to spatial information. Furthermore, the inherent locality of 3D CNNs limit modeling long-range dependencies and leading to sub-optimal results. In this work, we propose a coherent and unified Appearance Prompt Vision Transformer (APViT) to integrate affinity and metric learning to exploit the complementarity by learning long-range spatial dependencies. The proposed APViT enjoys several merits. First, the extension continuity-aware attention module aims at constructing hierarchical attention customized for neuron extensibility and slice continuity to learn instance voxel semantic context from a global perspective and utilize continuity priors to enhance voxel spatial awareness. Second, the appearance prompt modulator is responsible for leveraging voxel-adaptive appearance knowledge conditioned on affinity rich in spatial information to instruct instance voxel semantics, exploiting the potential of affinity learning to complement metric learning. Extensive experimental results on multiple challenging benchmarks demonstrate that our APViT achieves consistent improvements with huge flexibility under the same post-processing strategy.

List of keywords

Computer Vision -> CV: Biomedical image analysis

228

Adversarial Behavior Exclusion for Safe Reinforcement Learning

Md Asifur Rahman, Tongtong Liu, Sarra Alqahtani

[+] More

[-] Less

Learning by exploration makes reinforcement learning (RL) potentially attractive for many real-world applications. However, this learning process makes RL inherently too vulnerable to be used in real-world applications where safety is of utmost importance. Most prior studies consider exploration at odds with safety and thereby restrict it using either joint optimization of task and safety or imposing constraints for safe exploration. This paper migrates from the current convention to using exploration as a key to safety by learning safety as a robust behavior that completely excludes any behavioral pattern responsible for safety violations. Adversarial Behavior Exclusion for Safe RL (AdvEx-RL) learns a behavioral representation of the agent’s safety violations by approximating an optimal adversary utilizing exploration and later uses this representation to learn a separate safety policy that excludes those unsafe behaviors. In addition, AdvEx-RL ensures safety in a task-agnostic manner by acting as a safety firewall and therefore can be integrated with any RL task policy. We demonstrate the robustness of AdvEx-RL via comprehensive experiments in standard constrained Markov decision processes (CMDP) environments under 2 white-box action space perturbations as well as with changes in environment dynamics against 7 baselines. Consistently, AdvEx-RL outperforms the baselines by achieving an average safety performance of over 75% in the continuous action space with 10 times more variations in the testing environment dynamics. By using a standalone safety policy independent of conflicting objectives, AdvEx-RL also paves the way for interpretable safety behavior analysis as we show in our user study. This paper provides a novel study to investigate the robustness and interpretability of safe RL methods under deliberate perturbations.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Machine Learning -> ML: Reinforcement learning
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability

243

A Solution to Co-occurence Bias: Attributes Disentanglement via Mutual Information Minimization for Pedestrian Attribute Recognition

Yibo Zhou, Hai-Miao Hu, Jinzuo Yu, Zhenbo Xu, Weiqing Lu, Yuran Cao

[+] More

[-] Less

Recent studies on pedestrian attribute recognition progress with either explicit or implicit modeling of the co-occurence among attributes. Considering that this known a prior is highly variable and unforeseeable regarding the specific scenarios, we show that current methods can actually suffer in generalizing such fitted attributes interdependencies onto scenes or identities off the dataset distribution, resulting in the underlined bias of attributes co-occurence. To render models robust in realistic scenes, we propose the attributes-disentangled feature learning to ensure the recognition of an attribute not inferring on the existence of others, and which is sequentially formulated as a problem of mutual information minimization. Rooting from it, practical strategies are devised to efficiently decouple attributes, which substantially improve the baseline and establish state-of-the-art performance on realistic datasets like PETAzs and RAPzs.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Bias, fairness and privacy

251

Guided Patch-Grouping Wavelet Transformer with Spatial Congruence for Ultra-High Resolution Segmentation

Deyi Ji, Feng Zhao, Hongtao Lu

[+] More

[-] Less

Most existing ultra-high resolution (UHR) segmentation methods always struggle in the dilemma of balancing memory cost and local characterization accuracy, which are both taken into account in our proposed Guided Patch-Grouping Wavelet Transformer (GPWFormer) that achieves impressive performances. In this work, GPWFormer is a Transformer (T)-CNN (C) mutual leaning framework, where T takes the whole UHR image as input and harvests both local details and fine-grained long-range contextual dependencies, while C takes downsampled image as input for learning the category-wise deep context. For the sake of high inference speed and low computation complexity, T partitions the original UHR image into patches and groups them dynamically, then learns the low-level local details with the lightweight multi-head Wavelet Transformer (WFormer) network. Meanwhile, the fine-grained long-range contextual dependencies are also captured during this process, since patches that are far away in the spatial domain can also be assigned to the same group. In addition, masks produced by C are utilized to guide the patch grouping process, providing a heuristics decision. Moreover, the congruence constraints between the two branches are also exploited to maintain the spatial consistency among the patches. Overall, we stack the multi-stage process in a pyramid way. Experiments show that GPWFormer outperforms the existing methods with significant improvements on five benchmark datasets.

List of keywords

Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Segmentation

256

A Noisy-Label-Learning Formulation for Immune Repertoire Classification and Disease-Associated Immune Receptor Sequence Identification

Mingcai Chen, Yu Zhao, Zhonghuang Wang, Bing He, Jianhua Yao

[+] More

[-] Less

Immune repertoire classification, a typical multiple instance learning (MIL) problem, is a frontier research topic in computational biology that makes transformative contributions to new vaccines and immune therapies. However, the traditional instance-space MIL, directly assigning bag-level labels to instances, suffers from the massive amount of noisy labels and extremely low witness rate. In this work, we propose a noisy-label-learning formulation to solve the immune repertoire classification task. To remedy the inaccurate supervision of repertoire-level labels for a sequence-level classifier, we design a robust training strategy: The initial labels are smoothed to be asymmetric and are progressively corrected using the model’s predictions throughout the training process. Furthermore, two models with the same architecture but different parameter initialization are co-trained simultaneously to remedy the known “confirmation bias” problem in the self-training-like schema. As a result, we obtain accurate sequence-level classification and, subsequently, repertoire-level classification. Experiments on the Cytomegalovirus (CMV) and Cancer datasets demonstrate our method’s effectiveness and superior performance on sequence-level and repertoire-level tasks. Code available at https://github.com/TencentAILabHealthcare/NLL-IRC.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Bioinformatics
Multidisciplinary Topics and Applications -> MDA: Health and medicine

260

Generalization Bounds for Adversarial Metric Learning

Wen Wen, Han Li, Hong Chen, Rui Wu, Lingjuan Wu, Liangxuan Zhu

[+] More

[-] Less

Recently, adversarial metric learning has been proposed to enhance the robustness of the learned distance metric against adversarial perturbations. Despite rapid progress in validating its effectiveness empirically, theoretical guarantees on adversarial robustness and generalization are far less understood. To fill this gap, this paper focuses on unveiling the generalization properties of adversarial metric learning by developing the uniform convergence analysis techniques. Based on the capacity estimation of covering numbers, we establish the first high-probability generalization bounds with order O(n^{-1/2}) for adversarial metric learning with pairwise perturbations and general losses, where n is the number of training samples. Moreover, we obtain the refined generalization bounds with order O(n^{-1}) for the smooth loss by using local Rademacher complexity, which is faster than the previous result of adversarial pairwise learning, e.g., adversarial bipartite ranking. Experimental evaluation on real-world datasets validates our theoretical findings.

List of keywords

Machine Learning -> ML: Adversarial machine learning
Machine Learning -> ML: Learning theory

263

Improving LaCAM for Scalable Eventually Optimal Multi-Agent Pathfinding

Keisuke Okumura

[+] More

[-] Less

This study extends the recently-developed LaCAM algorithm for multi-agent pathfinding (MAPF). LaCAM is a sub-optimal search-based algorithm that uses lazy successor generation to dramatically reduce the planning effort. We present two enhancements. First, we propose its anytime version, called LaCAM*, which eventually converges to optima, provided that solution costs are accumulated transition costs. Second, we improve the successor generation to quickly obtain initial solutions. Exhaustive experiments demonstrate their utility. For instance, LaCAM* sub-optimally solved 99% of the instances retrieved from the MAPF benchmark, where the number of agents varied up to a thousand, within ten seconds on a standard desktop PC, while ensuring eventual convergence to optima; developing a new horizon of MAPF algorithms.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: Distributed and multi-agent planning
Robotics -> ROB: Motion and path planning
Planning and Scheduling -> PS: Planning algorithms

268

NerCo: A Contrastive Learning based Two-stage Chinese NER Method

Zai Zhang, Bin Shi, Haokun Zhang, Huang Xu, Yaodong Zhang, Yuefei Wu, Bo Dong, Qinghua Zheng

[+] More

[-] Less

Sequence labeling serves as the most commonly used scheme for Chinese named entity recognition(NER). However, traditional sequence labeling methods classify tokens within an entity into different classes according to their positions. As a result, different tokens in the same entity may be learned with representations that are isolated and unrelated in target representation space, which could finally negatively affect the subsequent performance of token classification. In this paper, we point out and define this problem as Entity Representation Segmentation in Label-semantics. And then we present NerCo: Named entity recognition with Contrastive learning, a novel NER framework which can better exploit labeled data and avoid the above problem. Following the pretrain-finetune paradigm, NerCo firstly guides the encoder to learn powerful label-semantics based representations by gathering the encoded token representations of the same Semantic Class while pushing apart that of different. Subsequently, NerCo finetunes the learned encoder for final entity prediction. Extensive experiments on several datasets demonstrate that our framework can consistently improve the baseline and achieve state-of-the-art performance. We release our codes at https://github.com/ijcainer/nerco.

List of keywords

Natural Language Processing -> NLP: Information extraction
Natural Language Processing -> NLP: Named entities
Natural Language Processing -> NLP: Tagging, chunking, and parsing

283

A Canonicalization-Enhanced Known Fact-Aware Framework For Open Knowledge Graph Link Prediction

Yilin Wang, Minghao Hu, Zhen Huang, Dongsheng Li, Xicheng Lu, Wei Luo, Dong Yang

[+] More

[-] Less

Open knowledge graph (OpenKG) link prediction aims to predict missing factual triples in the form of (head noun phrase, relation phrase, tail noun phrase), where existing triples are extracted from texts by open information extraction tools. Since triples are not canonicalized, previous methods either focus on canonicalizing noun phrases (NPs) to reduce graph sparsity, or utilize textual forms to improve type compatibility. However, they neglect to canonicalize relation phrases (RPs) and triples, making OpenKG maintain high sparsity and impeding the performance. To address the above issues, we propose a Canonicalization-Enhanced Known Fact-Aware (CEKFA) framework that boosts link prediction performance through sparsity reduction of RPs and triples. First, a similarity-driven RP canonicalization method is proposed to reduce RPs’ sparsity by sharing knowledge of semantically similar ones. Second, to reduce the sparsity of triples, a known fact-aware triple canonicalization method is designed to retrieve relevant known facts from training data. Finally, these two types of canonical information are integrated into a general two-stage re-ranking framework. Experiment results on two OpenKG datasets, ReVerb20K and ReVerb45K, show that our approach achieves state-of-the-art results. Extensive experimental analyses illustrate the effectiveness and generalization ability of the proposed framework.

List of keywords

Data Mining -> DM: Knowledge graphs and knowledge base completion
Data Mining -> DM: Information retrieval
Natural Language Processing -> NLP: Applications

292

Recognizable Information Bottleneck

Yilin Lyu, Xin Liu, Mingyang Song, Xinyue Wang, Yaxin Peng, Tieyong Zeng, Liping Jing

[+] More

[-] Less

Information Bottlenecks (IBs) learn representations that generalize to unseen data by information compression. However, existing IBs are practically unable to guarantee generalization in real-world scenarios due to the vacuous generalization bound. The recent PAC-Bayes IB uses information complexity instead of information compression to establish a connection with the mutual information generalization bound. However, it requires the computation of expensive second-order curvature, which hinders its practical application. In this paper, we establish the connection between the recognizability of representations and the recent functional conditional mutual information (f-CMI) generalization bound, which is significantly easier to estimate. On this basis we propose a Recognizable Information Bottleneck (RIB) which regularizes the recognizability of representations through a recognizability critic optimized by density ratio matching under the Bregman divergence. Extensive experiments on several commonly used datasets demonstrate the effectiveness of the proposed method in regularizing the model and estimating the generalization gap.

List of keywords

Machine Learning -> ML: Representation learning
Machine Learning -> ML: Classification

298

3D Surface Super-resolution from Enhanced 2D Normal Images: A Multimodal-driven Variational AutoEncoder Approach

Wuyuan Xie, Tengcong Huang, Miaohui Wang

[+] More

[-] Less

3D surface super-resolution is an important technical tool in virtual reality, which is also a research hotspot in the field of computer vision. Due to the unstructured and irregular nature of 3D object data, it is usually difficult to obtain high-quality surface details and geometry textures via a low-cost hardware setup. In this paper, we establish a multimodal-driven variational autoencoder (mmVAE) framework to perform 3D surface enhancement based on 2D normal images. To fully leverage the multimodal learning, we investigate a multimodal Gaussian mixture model (mmGMM) to align and fuse the latent feature representations from different modalities, and further propose a cross-scale encoder-decoder structure to reconstruct high-resolution normal images. Experimental results on several benchmark datasets demonstrate that our method delivers promising surface geometry structures and details in comparison with competitive advances.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: 3D computer vision

299

Model Conversion via Differentially Private Data-Free Distillation

Bochao Liu, Pengju Wang, Shikun Li, Dan Zeng, Shiming Ge

[+] More

[-] Less

While massive valuable deep models trained on large-scale data have been released to facilitate the artificial intelligence community, they may encounter attacks in deployment which leads to privacy leakage of training data. In this work, we propose a learning approach termed differentially private data-free distillation (DPDFD) for model conversion that can convert a pretrained model (teacher) into its privacy-preserving counterpart (student) via an intermediate generator without access to training data. The learning collaborates three parties in a unified way. First, massive synthetic data are generated with the generator. Then, they are fed into the teacher and student to compute differentially private gradients by normalizing the gradients and adding noise before performing descent. Finally, the student is updated with these differentially private gradients and the generator is updated by taking the student as a fixed discriminator in an alternate manner. In addition to a privacy-preserving student, the generator can generate synthetic data in a differentially private way for other down-stream tasks. We theoretically prove that our approach can guarantee differential privacy and well convergence. Extensive experiments that significantly outperform other differentially private generative approaches demonstrate the effectiveness of our approach.

List of keywords

Data Mining -> DM: Privacy-preserving data mining
Computer Vision -> CV: Bias, fairness and privacy

307

Self-supervised Neuron Segmentation with Multi-Agent Reinforcement Learning

Yinda Chen, Wei Huang, Shenglong Zhou, Qi Chen, Zhiwei Xiong

[+] More

[-] Less

The performance of existing supervised neuron segmentation methods is highly dependent on the amount of accurate annotations, especially when applied to large scale electron microscope (EM) data. By extracting semantic information from unlabeled data, self-supervised methods can improve the performance of downstream tasks, among which the mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images. However, due to the high degree of structural locality in EM images, as well as the existence of considerable noise, many voxels contain little discriminative information, making MIM pre-training inefficient on the neuron segmentation task. To overcome this challenge, we propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy. Due to the vast exploration space, using single-agent RL for voxel prediction is impractical. Therefore, we treat each input patch as an agent with a shared behavior policy, allowing for multi-agent collaboration. Furthermore, this multi-agent model is able to capture dependencies between voxels, which is beneficial for the downstream segmentation task. Experiments conducted on representative EM datasets demonstrate that our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.

List of keywords

Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Segmentation
Machine Learning -> ML: Self-supervised Learning

326

TDG4Crowd:Test Data Generation for Evaluation of Aggregation Algorithms in Crowdsourcing

Yili Fang, Chaojie Shen, Huamao Gu, Tao Han, Xinyi Ding

[+] More

[-] Less

In crowdsourcing, existing efforts mainly use real datasets collected from crowdsourcing as test datasets to evaluate the effectiveness of aggregation algorithms. However, these work ignore the fact that the datasets obtained by crowdsourcing are usually sparse and imbalanced due to limited budget. As a result, applying the same aggregation algorithm on different datasets often show contradicting conclusions. For example, on the RTE dataset, Dawid and Skene model performs significantly better than Majority Voting, while on the LableMe dataset, the experiments give the opposite conclusion. It is challenging to obtain comprehensive and balanced datasets at a low cost. To our best knowledge, little effort have been made to the fair evaluation of aggregation algorithms. To fill in this gap, we propose a novel method named TDG4Crowd that can automatically generate comprehensive and balanced datasets. Using Kullback Leibler divergence and Kolmogorov–Smirnov test, the experiment results show the superior of our method compared with others. Aggregation algorithms also perform more consistently on the synthetic datasets generated using our method.

List of keywords

Humans and AI -> HAI: Human computation and crowdsourcing
Machine Learning -> ML: Autoencoders
Machine Learning -> ML: Cost-sensitive learning

328

Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks

Carlos Martin, Tuomas Sandholm

[+] More

[-] Less

We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. Such game access is common in reinforcement learning settings, where the environment is typically treated as a black box. To tackle this problem, we apply zeroth-order optimization techniques that combine smoothed gradient estimators with equilibrium-finding dynamics. We model players’ strategies using artificial neural networks. In particular, we use randomized policy networks to model mixed strategies. These take noise in addition to an observation as input and can flexibly represent arbitrary observation-dependent, continuous-action distributions. Being able to model such mixed strategies is crucial for tackling continuous-action games that lack pure-strategy equilibria. We evaluate the performance of our method using an approximation of the Nash convergence metric from game theory, which measures how much players can benefit from unilaterally changing their strategy. We apply our method to continuous Colonel Blotto games, single-item and multi-item auctions, and a visibility game. The experiments show that our method can quickly find a high-quality approximate equilibrium. Furthermore, they show that the dimensionality of the input noise is crucial for performance. To our knowledge, this paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games

339

SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution Situations

Chan Kim, Jaekyung Cho, Christophe Bobda, Seung-Woo Seo, Seong-Woo Kim

[+] More

[-] Less

Robotic agents trained using reinforcement learning have the problem of taking unreliable actions in an out-of-distribution (OOD) state. Agents can easily become OOD in real-world environments because it is almost impossible for them to visit and learn the entire state space during training. Unfortunately, unreliable actions do not ensure that agents perform their original tasks successfully. Therefore, agents should be able to recognize whether they are in OOD states and learn how to return to the learned state distribution rather than continue to take unreliable actions. In this study, we propose a novel method for retraining agents to recover from OOD situations in a self-supervised manner when they fall into OOD states. Our in-depth experimental results demonstrate that our method substantially improves the agent’s ability to recover from OOD situations in terms of sample efficiency and restoration of the performance for the original tasks. Moreover, we show that our method can retrain the agent to recover from OOD situations even when in-distribution states are difficult to visit through exploration. Code and supplementary materials are available at https://github.com/SNUChanKim/SeRO.

List of keywords

Machine Learning -> ML: Deep reinforcement learning
Machine Learning -> ML: Self-supervised Learning
Robotics -> ROB: Learning in robotics

345

OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking

Jiahao Nie, Zhiwei He, Yuxiang Yang, Zhengyi Bao, Mingyu Gao, Jing Zhang

[+] More

[-] Less

Two-stage point-to-box network acts as a critical role in the recent popular 3D Siamese tracking paradigm, which first generates proposals and then predicts corresponding proposal-wise scores. However, such a network suffers from tedious hyper-parameter tuning and task misalignment, limiting the tracking performance. Towards these concerns, we propose a simple yet effective one-stage point-to-box network for point cloud-based 3D single object tracking. It synchronizes 3D proposal generation and center-ness score prediction by a parallel predictor without tedious hyper-parameters. To guide a task-aligned score ranking of proposals, a center-aware focal loss is proposed to supervise the training of the center-ness branch, which enhances the network’s discriminative ability to distinguish proposals of different quality. Besides, we design a binary target classifier to identify target-relevant points. By integrating the derived classification scores with the center-ness scores, the resulting network can effectively suppress interference proposals and further mitigate task misalignment. Finally, we present a novel one-stage Siamese tracker OSP2B equipped with the designed network. Extensive experiments on challenging benchmarks including KITTI and Waymo SOT Dataset show that our OSP2B achieves leading performance with a considerable real-time speed.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Motion and tracking
Robotics -> ROB: Robotics and vision

350

Calibrating a Deep Neural Network with Its Predecessors

Linwei Tao, Minjing Dong, Daochang Liu, Changming Sun, Chang Xu

[+] More

[-] Less

Confidence calibration – the process to calibrate the output probability distribution of neural networks – is essential for safety-critical applications of such networks. Recent works verify the link between mis-calibration and overfitting. However, early stopping, as a well-known technique to mitigate overfitting, fails to calibrate networks. In this work, we study the limitions of early stopping and comprehensively analyze the overfitting problem of a network considering each individual block. We then propose a novel regularization method, predecessor combination search (PCS), to improve calibration by searching a combination of best-fitting block predecessors, where block predecessors are the corresponding network blocks with weight parameters from earlier training stages. PCS achieves the state-of-the-art calibration performance on multiple datasets and architectures. In addition, PCS improves model robustness under dataset distribution shift. Supplementary material and code are available at https://github.com/Linwei94/PCS

List of keywords

Machine Learning -> ML: Classification

375

Spike Count Maximization for Neuromorphic Vision Recognition

Jianxiong Tang, Jian-Huang Lai, Xiaohua Xie, Lingxiao Yang

[+] More

[-] Less

Spiking Neural Networks (SNNs) are the promising models of neuromorphic vision recognition. The mean square error (MSE) and cross-entropy (CE) losses are widely applied to supervise the training of SNNs on neuromorphic datasets. However, the relevance between the output spike counts and predictions is not well modeled by the existing loss functions. This paper proposes a Spike Count Maximization (SCM) training approach for the SNN-based neuromorphic vision recognition model based on optimizing the output spike counts. The SCM is achieved by structural risk minimization (SRM) and a specially designed spike counting loss. The spike counting loss counts the output spikes of the SNN by using the L0-norm, and the SRM maximizes the distance between the margin boundaries of the classifier to ensure the generalization of the model. The SCM is non-smooth and non-differentiable, and we design an iterative algorithm with fast convergence to solve the problem. Experiment results demonstrate that the SCM performs satisfactorily in most cases. Using the output spikes for prediction, the accuracies of SCM are 2.12%~16.50% higher than the popular training losses on the CIFAR10-DVS dataset. The code is available at https://github.com/TJXTT/SCM-SNN.

List of keywords

Machine Learning -> ML: Classification
Computer Vision -> CV: Machine learning for vision

379

Eliminating the Computation of Strongly Connected Components in Generalized Arc Consistency Algorithm for AllDifferent Constraint

Luhan Zhen, Zhanshan Li, Yanzhi Li, Hongbo Li

[+] More

[-] Less

AllDifferent constraint is widely used in Constraint Programming to model real world problems. Existing Generalized Arc Consistency (GAC) algorithms map an AllDifferent constraint onto a bipartite graph and utilize the structure of Strongly Connected Components (SCCs) in the graph to filter values. Calculating SCCs is time-consuming in the existing algorithms, so we propose a novel GAC algorithm for AllDifferent constraint in this paper, which eliminates the computation of SCCs. We prove that all redundant edges in the bipartite graph point to some alternating cycles. Our algorithm exploits this property and uses a more efficient method to filter values, which is based on breadth-first search. Experimental results on the XCSP3 benchmark suite show that our algorithm considerably outperforms the state-of-the-art GAC algorithms.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Constraint Satisfaction and Optimization -> CSO: Constraint programming

382

Learning to Learn from Corrupted Data for Few-Shot Learning

Yuexuan An, Xingyu Zhao, Hui Xue

[+] More

[-] Less

Few-shot learning which aims to generalize knowledge learned from annotated base training data to recognize unseen novel classes has attracted considerable attention. Existing few-shot methods rely on completely clean training data. However, in the real world, the training data are always corrupted and accompanied by noise due to the disturbance in data transmission and low-quality annotation, which severely degrades the performance and generalization capability of few-shot models. To address the problem, we propose a unified peer-collaboration learning (PCL) framework to extract valid knowledge from corrupted data for few-shot learning. PCL leverage two modules to mimic the peer collaboration process which cooperatively evaluates the importance of each sample. Specifically, each module first estimates the importance weights of different samples by encoding the information provided by the other module from both global and local perspectives. Then, both modules leverage the obtained importance weights to guide the reevaluation of the loss value of each sample. In this way, the peers can mutually absorb knowledge to improve the robustness of few-shot models. Experiments verify that our framework combined with different few-shot methods can significantly improve the performance and robustness of original models.

List of keywords

Machine Learning -> ML: Few-shot learning

389

Why Rumors Spread Fast in Social Networks, and How to Stop It

Ahad N. Zehmakan, Charlotte Out, Sajjad Hesamipour Khelejan

[+] More

[-] Less

We study a rumor spreading model where individuals are connected via a network structure. Initially, only a small subset of the individuals are spreading a rumor. Each individual who is connected to a spreader, starts spreading the rumor with some probability as a function of their trust in the spreader, quantified by the Jaccard similarity index. Furthermore, the probability that a spreader diffuses the rumor decreases over time until they fully lose their interest and stop spreading. We focus on determining the graph parameters which govern the magnitude and pace that the rumor spreads in this model. We prove that for the rumor to spread to a sizable fraction of the individuals, the network needs to enjoy “strong” expansion properties and most nodes should be in “well-connected” communities. Both of these characteristics are, arguably, present in real-world social networks up to a certain degree, shedding light on the driving force behind the extremely fast spread of rumors in social networks. Furthermore, we formulate a large range of countermeasures to cease the spread of a rumor. We introduce four fundamental criteria which a countermeasure ideally should possess. We evaluate all the proposed countermeasures by conducting experiments on real-world social networks such as Facebook and Twitter. We conclude that our novel decentralized countermeasures (which are executed by the individuals) generally outperform the previously studied centralized ones (which need to be imposed by a third entity such as the government).

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Game Theory and Economic Paradigms -> GTEP: Computational social choice
Multidisciplinary Topics and Applications -> MDA: Web and social networks

396

A Large-Scale Film Style Dataset for Learning Multi-frequency Driven Film Enhancement

Zinuo Li, Xuhang Chen, Shuqiang Wang, Chi-Man Pun

[+] More

[-] Less

Film, a classic image style, is culturally significant to the whole photographic industry since it marks the birth of photography. However, film photography is time-consuming and expensive, necessitating a more efficient method for collecting film-style photographs. Numerous datasets that have emerged in the field of image enhancement so far are not film-specific. In order to facilitate film-based image stylization research, we construct FilmSet, a large-scale and high-quality film style dataset. Our dataset includes three different film types and more than 5000 in-the-wild high resolution images. Inspired by the features of FilmSet images, we propose a novel framework called FilmNet based on Laplacian Pyramid for stylizing images across frequency bands and achieving film style outcomes. Experiments reveal that the performance of our model is superior than state-of-the-art techniques. The link of our dataset and code is https://github.com/CXH-Research/FilmNet.

List of keywords

Computer Vision -> CV: Computational photography
Computer Vision -> CV: Machine learning for vision

398

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation

Xuewei Li, Tao Wu, Gaoang Wang, Zhongang Qi, Ying Shan, Xi Li

[+] More

[-] Less

As an important and challenging problem in computer vision, PAnoramic Semantic Segmentation (PASS) gives complete scene perception based on an ultra-wide angle of view. Usually, prevalent PASS methods with 2D panoramic image input focus on solving image distortions but lack consideration of the 3D properties of original 360 degree data. Their performance will drop a lot when inputting panoramic images with the 3D disturbance. To be more robust to 3D disturbance, we propose our Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation (SGAT4PASS), considering 3D spherical geometry knowledge. Specifically, a spherical geometry-aware framework is proposed for PASS. It includes three modules, i.e., spherical geometry-aware image projection, spherical deformable patch embedding, and a panorama-aware loss, which takes input images with 3D disturbance into account, adds a spherical geometry-aware constraint on the existing deformable patch embedding, and indicates the pixel density of original 360 degree data, respectively.Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of our performance is improved by an order of magnitude.

List of keywords

Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Recognition (object detection, categorization)

399

Asynchronous Communication Aware Multi-Agent Task Allocation

Ben Rachmut, Sofia Amador Nelke, Roie Zivan

[+] More

[-] Less

Multi-agent task allocation in physical environments with spatial and temporal constraints, are hard problems that are relevant in many realistic applications. A task allocation algorithm based on Fisher market clearing (FMC_TA), that can be performed either centrally or distributively, has been shown to produce high quality allocations in comparison to both centralized and distributed state of the art incomplete optimization algorithms. However, the algorithm is synchronous and therefore depends on perfect communication between agents. We propose FMC_ATA, an asynchronous version of FMC_TA, which is robust to message latency and message loss. In contrast to the former version of the algorithm, FMC_ATA allows agents to identify dynamic events and initiate the generation of an updated allocation. Thus, it is more compatible for dynamic environments. We further investigate the conditions in which the distributed version of the algorithm is preferred over the centralized version. Our results indicate that the proposed asynchronous distributed algorithm produces consistent results even when the communication level is extremely poor.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Agent communication
Constraint Satisfaction and Optimization -> CSO: Distributed constraints

400

Measuring Acoustics with Collaborative Multiple Agents

Yinfeng Yu, Changan Chen, Lele Cao, Fangkai Yang, Fuchun Sun

[+] More

[-] Less

As humans, we hear sound every second of our life. The sound we hear is often affected by the acoustics of the environment surrounding us. For example, a spacious hall leads to more reverberation. Room Impulse Responses (RIR) are commonly used to characterize environment acoustics as a function of the scene geometry, materials, and source/receiver locations. Traditionally, RIRs are measured by setting up a loudspeaker and microphone in the environment for all source/receiver locations, which is time-consuming and inefficient. We propose to let two robots measure the environment’s acoustics by actively moving and emitting/receiving sweep signals. We also devise a collaborative multi-agent policy where these two robots are trained to explore the environment’s acoustics while being rewarded for wide exploration and accurate prediction. We show that the robots learn to collaborate and move to explore environment acoustics while minimizing the prediction error. To the best of our knowledge, we present the very first problem formulation and solution to the task of collaborative environment acoustics measurements with multiple agents.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Game Theory and Economic Paradigms -> GTEP: Cooperative games

408

Learning Object Consistency and Interaction in Image Generation from Scene Graphs

Yangkang Zhang, Chenye Meng, Zejian Li, Pei Chen, Guang Yang, Changyuan Yang, Lingyun Sun

[+] More

[-] Less

This paper is concerned with synthesizing images conditioned on a scene graph (SG), a set of object nodes and their edges of interactive relations. We divide existing works into image-oriented and code-oriented methods. In our analysis, the image-oriented methods do not consider object interaction in spatial hidden feature. On the other hand, in empirical study, the code-oriented methods lose object consistency as their generated images miss certain objects in the input scene graph. To alleviate these two issues, we propose Learning Object Consistency and Interaction (LOCI). To preserve object consistency, we design a consistency module with a weighted augmentation strategy for objects easy to be ignored and a matching loss between scene graphs and image codes. To learn object interaction, we design an interaction module consisting of three kinds of message propagation between the input scene graph and the learned image code. Experiments on COCO-stuff and Visual Genome datasets show our proposed method alleviates the ignorance of objects and outperforms the state-of-the-art on visual fidelity of generated images and objects.

List of keywords

Computer Vision -> CV: Neural generative models, auto encoders, GANs
Computer Vision -> CV: Scene analysis and understanding

429

VGOS: Voxel Grid Optimization for View Synthesis from Sparse Inputs

Jiakai Sun, Zhanjie Zhang, Jiafu Chen, Guangyuan Li, Boyan Ji, Lei Zhao, Wei Xing

[+] More

[-] Less

Neural Radiance Fields (NeRF) has shown great success in novel view synthesis due to its state-of-the-art quality and flexibility. However, NeRF requires dense input views (tens to hundreds) and a long training time (hours to days) for a single scene to generate high-fidelity images. Although using the voxel grids to represent the radiance field can significantly accelerate the optimization process, we observe that for sparse inputs, the voxel grids are more prone to overfitting to the training views and will have holes and floaters, which leads to artifacts. In this paper, we propose VGOS, an approach for fast (3-5 minutes) radiance field reconstruction from sparse inputs (3-10 views) to address these issues. To improve the performance of voxel-based radiance field in sparse input scenarios, we propose two methods: (a) We introduce an incremental voxel training strategy, which prevents overfitting by suppressing the optimization of peripheral voxels in the early stage of reconstruction. (b) We use several regularization techniques to smooth the voxels, which avoids degenerate solutions. Experiments demonstrate that VGOS achieves state-of-the-art performance for sparse inputs with super-fast convergence. Code will be available at https://github.com/SJoJoK/VGOS.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Applications

434

Unreliable Partial Label Learning with Recursive Separation

Yu Shi, Ning Xu, Hua Yuan, Xin Geng

[+] More

[-] Less

Partial label learning (PLL) is a typical weakly supervised learning problem in which each instance is associated with a candidate label set, and among which only one is true. However, the assumption that the ground-truth label is always among the candidate label set would be unrealistic, as the reliability of the candidate label sets in real-world applications cannot be guaranteed by annotators. Therefore, a generalized PLL named Unreliable Partial Label Learning (UPLL) is proposed, in which the true label may not be in the candidate label set. Due to the challenges posed by unreliable labeling, previous PLL methods will experience a marked decline in performance when applied to UPLL. To address the issue, we propose a two-stage framework named Unreliable Partial Label Learning with Recursive Separation (UPLLRS). In the first stage, the self-adaptive recursive separation strategy is proposed to separate the training set into a reliable subset and an unreliable subset. In the second stage, a disambiguation strategy is employed to progressively identify the ground-truth labels in the reliable subset. Simultaneously, semi-supervised learning methods are adopted to extract valuable information from the unreliable subset. Our method demonstrates state-of-the-art performance as evidenced by experimental results, particularly in situations of high unreliability. Code and supplementary materials are available at https://github.com/dhiyu/UPLLRS.

List of keywords

Machine Learning -> ML: Weakly supervised learning

435

ViT-CX: Causal Explanation of Vision Transformers

Weiyan Xie, Xiao-Hui Li, Caleb Chen Cao, Nevin Zhang

[+] More

[-] Less

Despite the popularity of Vision Transformers (ViTs) and eXplainable AI (XAI), only a few explanation methods have been designed specially for ViTs thus far. They mostly use attention weights of the [CLS] token on patch embeddings and often produce unsatisfactory saliency maps. This paper proposes a novel method for explaining ViTs called ViT-CX. It is based on patch embeddings, rather than attentions paid to them, and their causal impacts on the model output. Other characteristics of ViTs such as causal overdetermination are considered in the design of ViT-CX. The empirical results show that ViT-CX produces more meaningful saliency maps and does a better job revealing all important evidence for the predictions than previous methods. The explanation generated by ViT-CX also shows significantly better faithfulness to the model.

List of keywords

Computer Vision -> CV: Interpretability and transparency
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Explainable/Interpretable machine learning

449

CTW: Confident Time-Warping for Time-Series Label-Noise Learning

Peitian Ma, Zhen Liu, Junhao Zheng, Linghao Wang, Qianli Ma

[+] More

[-] Less

Noisy labels seriously degrade the generalization ability of Deep Neural Networks (DNNs) in various classification tasks. Existing studies on label-noise learning mainly focus on computer vision. However, time series also suffer from the same issue. Directly applying the methods from computer vision to time series may reduce the temporal dependency due to different data characteristics. How to make use of the properties of time series to enable DNNs to learn robust representations in the presence of noisy labels has not been fully explored. To this end, this paper proposes a method that expands the distribution of Confident instances by Time-Warping (CTW) to learn robust representations of time series. Specifically, since applying the augmentation method to all data may introduce extra mislabeled data, we select confident instances to implement Time-Warping. In addition, we normalize the distribution of the training loss of each class to eliminate the model’s selection preference for instances of different classes, alleviating the class imbalance caused by sample selection. Extensive experimental results show that CTW achieves state-of-the-art performance on the UCR datasets when dealing with different types of noise. Besides, the t-SNE visualization of our method verifies that augmenting confident data improves the generalization ability.

List of keywords

Machine Learning -> ML: Classification
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Time series and data streams

451

Invertible Residual Neural Networks with Conditional Injector and Interpolator for Point Cloud Upsampling

Yaqi Duan, Aihua Mao, Yu-Hui Wen, Zihui Du, Hongmin Cai, Yong-Jin Liu

[+] More

[-] Less

Point clouds obtained by LiDAR and other sensors are usually sparse and irregular. Low-quality point clouds have serious influence on the final performance of downstream tasks. Recently, a point cloud upsampling network with normalizing flows has been proposed to address this problem. However, the network heavily relies on designing specialized architectures to achieve invertibility. In this paper, we propose a novel invertible residual neural network for point cloud upsampling, called PU-INN, which allows unconstrained architectures to learn more expressive feature transformations. Then, we propose a conditional injector to improve nonlinear transformation ability of the neural network while guaranteeing invertibility. Furthermore, a lightweight interpolator is proposed based on semantic similarity distance in the latent space, which can intuitively reflect the interpolation changes in Euclidean space. Qualitative and quantitative results show that our method outperforms the state-of-the-art works in terms of distribution uniformity, proximity-to-surface accuracy, 3D reconstruction quality, and computation efficiency.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Neural generative models, auto encoders, GANs
Machine Learning -> ML: Probabilistic machine learning

460

ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds

Xiangze Jia, Hui Zhou, Xinge Zhu, Yandong Guo, Ji Zhang, Yuexin Ma

[+] More

[-] Less

In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation. Different from usually adopted self-supervised strategies for data-level structure consistency, we predict scene motion via feature-level consistency between pillars in consecutive frames, which can eliminate the effect caused by noise points and view-changing point clouds in dynamic scenes. Specifically, we propose Soft Discriminative Loss that provides the network with more pseudo-supervised signals to learn discriminative and robust features in a contrastive learning manner. We also propose Gated Multi-Frame Fusion block that learns valid compensation between point cloud frames automatically to enhance feature extraction. Finally, pillar association is proposed to predict pillar correspondence probabilities based on feature distance, and whereby further predicts scene motion. Extensive experiments show the effectiveness and superiority of our ContrastMotion on both scene flow and motion prediction tasks.

List of keywords

Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Scene analysis and understanding

479

Causal Deep Reinforcement Learning using Observational Data

Wenxuan Zhu, Chao Yu, Qiang Zhang

[+] More

[-] Less

Deep reinforcement learning (DRL) requires the collection of plenty of interventional data, which is sometimes expensive and even unethical in the real world, such as in the autonomous driving and the medical field. Offline reinforcement learning promises to alleviate this issue by exploiting the vast amount of observational data available in the real world. However, observational data may mislead the learning agent to undesirable outcomes if the behavior policy that generates the data depends on unobserved random variables (i.e., confounders). In this paper, we propose two deconfounding methods in DRL to address this problem. The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function by reweighting or resampling the offline dataset to ensure its unbiasedness. These deconfounding methods can be flexibly combined with the existing model-free DRL algorithms such as soft actor-critic and deep Q-learning, provided that a weak condition can be satisfied by the loss functions of these algorithms. We prove the effectiveness of our deconfounding methods and validate them experimentally.

List of keywords

Machine Learning -> ML: Deep reinforcement learning
Machine Learning -> ML: Causality
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference

485

KDLGT: A Linear Graph Transformer Framework via Kernel Decomposition Approach

Yi Wu, Yanyang Xu, Wenhao Zhu, Guojie Song, Zhouchen Lin, Liang Wang, Shaoguo Liu

[+] More

[-] Less

In recent years, graph Transformers (GTs) have been demonstrated as a robust architecture for a wide range of graph learning tasks. However, the quadratic complexity of GTs limits their scalability on large-scale data, in comparison to Graph Neural Networks (GNNs). In this work, we propose the Kernel Decomposition Linear Graph Transformer (KDLGT), an accelerating framework for building scalable and powerful GTs. KDLGT employs the kernel decomposition approach to rearrange the order of matrix multiplication, thereby reducing complexity to linear. Additionally, it categorizes GTs into three distinct types and provides tailored accelerating methods for each category to encompass all types of GTs. Furthermore, we provide a theoretical analysis of the performance gap between KDLGT and self-attention to ensure its effectiveness. Under this framework, we select two representative GTs to design our models. Experiments on both real-world and synthetic datasets indicate that KDLGT not only achieves state-of-the-art performance on various datasets but also reaches an acceleration ratio of approximately 10 on graphs of certain sizes.

List of keywords

Data Mining -> DM: Big data and scalability
Data Mining -> DM: Mining graphs

490

Physics-Guided Human Motion Capture with Pose Probability Modeling

Jingyi Ju, Buzhen Huang, Chen Zhu, Zhihao Li, Yangang Wang

[+] More

[-] Less

Incorporating physics in human motion capture to avoid artifacts like floating, foot sliding, and ground penetration is a promising direction. Existing solutions always adopt kinematic results as reference motions, and the physics is treated as a post-processing module. However, due to the depth ambiguity, monocular motion capture inevitably suffers from noises, and the noisy reference often leads to failure for physics-based tracking. To address the obstacles, our key-idea is to employ physics as denoising guidance in the reverse diffusion process to reconstruct physically plausible human motion from a modeled pose probability distribution. Specifically, we first train a latent gaussian model that encodes the uncertainty of 2D-to-3D lifting to facilitate reverse diffusion. Then, a physics module is constructed to track the motion sampled from the distribution. The discrepancies between the tracked motion and image observation are used to provide explicit guidance for the reverse diffusion model to refine the motion. With several iterations, the physics-based tracking and kinematic denoising promote each other to generate a physically plausible human motion. Experimental results show that our method outperforms previous physics-based methods in both joint accuracy and success rate.

List of keywords

Computer Vision -> CV: Biometrics, face, gesture and pose recognition
Computer Vision -> CV: 3D computer vision

526

Constraints First: A New MDD-based Model to Generate Sentences Under Constraints

Alexandre Bonlarron, Aurelie Calabrese, Pierre Kornprobst, Jean-Charles Régin

[+] More

[-] Less

This paper introduces a new approach to generating strongly constrained texts. We consider standardized sentence generation for the typical application of vision screening. To solve this problem, we formalize it as a discrete combinatorial optimization problem and utilize multivalued decision diagrams (MDD), a well-known data structure to deal with constraints. In our context, one key strength of MDD is to compute an exhaustive set of solutions without performing any search. Once the sentences are obtained, we apply a language model (GPT-2) to keep the best ones. We detail this for English and also for French where the agreement and conjugation rules are known to be more complex. Finally, with the help of GPT-2, we get hundreds of bona-fide candidate sentences. When compared with the few dozen sentences usually available in the well-known vision screening test (MNREAD), this brings a major breakthrough in the field of standardized sentence generation. Also, as it can be easily adapted for other languages, it has the potential to make the MNREAD test even more valuable and usable. More generally, this paper highlights MDD as a convincing alternative for constrained text generation, especially when the constraints are hard to satisfy, but also for many other prospects.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint programming
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Modeling

534

On the Fairness Impacts of Private Ensembles Models

Cuong Tran, Ferdinando Fioretto

[+] More

[-] Less

The Private Aggregation of Teacher Ensembles (PATE) is a machine learning framework that enables the creation of private models through the combination of multiple "teacher" models and a "student" model. The student model learns to pre6 dict an output based on the voting of the teachers, and the resulting model satisfies differential privacy. PATE has been shown to be effective in creating private models in semi-supervised settings or when protecting data labels is a priority. This paper explores whether the use of PATE can result in unfairness, and demonstrates that it can lead to accuracy disparities among groups of individuals. The paper also analyzes the algorithmic and data properties that contribute to these disproportionate impacts, why these aspects are affecting different groups disproportionately, and offers recommendations for mitigating these effects.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Computer Vision -> CV: Bias, fairness and privacy
Multidisciplinary Topics and Applications -> MDA: Security and privacy

536

BPNet: Bézier Primitive Segmentation on 3D Point Clouds

Rao Fu, Cheng Wen, QIAN LI, Xiao Xiao, Pierre Alliez

[+] More

[-] Less

This paper proposes BPNet, a novel end-to-end deep learning framework to learn Bézier primitive segmentation on 3D point clouds. The existing works treat different primitive types separately, thus limiting them to finite shape categories. To address this issue, we seek a generalized primitive segmentation on point clouds. Taking inspiration from Bézier decomposition on NURBS models, we transfer it to guide point cloud segmentation casting off primitive types. A joint optimization framework is proposed to learn Bézier primitive segmentation and geometric fitting simultaneously on a cascaded architecture. Specifically, we introduce a soft voting regularizer to improve primitive segmentation and propose an auto-weight embedding module to cluster point features, making the network more robust and generic. We conducted extensive experiments on synthetic datasets (ABC Dataset) and real-scan datasets to validate and compare our approach with different baseline methods. Experiments show superior performance over previous work in terms of segmentation, with a substantially faster testing speed.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Segmentation

539

Rubik’s Optical Neural Networks: Multi-task Learning with Physics-aware Rotation Architecture

Yingjie Li, Weilu Gao, Cunxi Yu

[+] More

[-] Less

Recently, there are increasing efforts on advancing optical neural networks (ONNs), which bring significant advantages for machine learning (ML) in terms of power efficiency, parallelism, and computational speed. With the considerable benefits in computation speed and energy efficiency, there are significant interests in leveraging ONNs into medical sensing, security screening, drug detection, and autonomous driving. However, due to the challenge of implementing reconfigurability, deploying multi-task learning (MTL) algorithms on ONNs requires re-building and duplicating the physical diffractive systems, which significantly degrades the energy and cost efficiency in practical application scenarios. This work presents a novel ONNs architecture, namely, RubikONNs, which utilizes the physical properties of optical systems to encode multiple feed-forward functions by physically rotating the hardware similarly to rotating a Rubik’s Cube. To optimize MTL performance on RubikONNs, two domain-specific physics-aware training algorithms RotAgg and RotSeq are proposed. Our experimental results demonstrate more than 4x improvements in energy and cost efficiency with marginal accuracy degradation compared to the state-of-the-art approaches.

List of keywords

Multidisciplinary Topics and Applications -> MDA: AI hardware
Machine Learning -> ML: Classification
Multidisciplinary Topics and Applications -> MDA: Physical sciences

541

On Approximating Total Variation Distance

Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S Meel, Dimitrios Myrisiotis, A Pavan, Vinodchandran N. Variyam

[+] More

[-] Less

Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the problem of computing the TV distance of two product distributions over the domain {0,1}^n. In particular, we establish the following results. 1. The problem of exactly computing the TV distance of two product distributions is #P-complete. This is in stark contrast with other distance measures such as KL, Chi-square, and Hellinger which tensorize over the marginals leading to efficient algorithms. 2. There is a fully polynomial-time deterministic approximation scheme (FPTAS) for computing the TV distance of two product distributions P and Q where Q is the uniform distribution. This result is extended to the case where Q has a constant number of distinct marginals. In contrast, we show that when P and Q are Bayes net distributions the relative approximation of their TV distance is NP-hard.

List of keywords

Machine Learning -> ML: Other

547

Ensemble Reinforcement Learning in Continuous Spaces — A Hierarchical Multi-Step Approach for Policy Training

Gang Chen, Victoria Huang

[+] More

[-] Less

Actor-critic deep reinforcement learning (DRL) algorithms have recently achieved prominent success in tackling various challenging reinforcement learning (RL) problems, particularly complex control tasks with high-dimensional continuous state and action spaces. Nevertheless, existing research showed that actor-critic DRL algorithms often failed to explore their learning environments effectively, resulting in limited learning stability and performance. To address this limitation, several ensemble DRL algorithms have been proposed lately to boost exploration and stabilize the learning process. However, most of existing ensemble algorithms do not explicitly train all base learners towards jointly optimizing the performance of the ensemble. In this paper, we propose a new technique to train an ensemble of base learners based on an innovative multi-step integration method. This training technique enables us to develop a new hierarchical learning algorithm for ensemble DRL that effectively promotes inter-learner collaboration through stable inter-learner parameter sharing. The design of our new algorithm is verified theoretically. The algorithm is also shown empirically to outperform several state-of-the-art DRL algorithms on multiple benchmark RL problems.

List of keywords

Machine Learning -> ML: Deep reinforcement learning
Machine Learning -> ML: Ensemble methods

555

Strategic Adversarial Attacks in AI-assisted Decision Making to Reduce Human Trust and Reliance

Zhuoran Lu, Zhuoyan Li, Chun-Wei Chiang, Ming Yin

[+] More

[-] Less

With the increased integration of AI technologies in human decision making processes, adversarial attacks on AI models become a greater concern than ever before as they may significantly hurt humans’ trust in AI models and decrease the effectiveness of human-AI collaboration. While many adversarial attack methods have been proposed to decrease the performance of an AI model, limited attention has been paid on understanding how these attacks will impact the human decision makers interacting with the model, and accordingly, how to strategically deploy adversarial attacks to maximize the reduction of human trust and reliance. In this paper, through a human-subject experiment, we first show that in AI-assisted decision making, the timing of the attacks largely influences how much humans decrease their trust in and reliance on AI—the decrease is particularly salient when attacks occur on decision making tasks that humans are highly confident themselves. Based on these insights, we next propose an algorithmic framework to infer the human decision maker’s hidden trust in the AI model and dynamically decide when the attacker should launch an attack to the model. Our evaluations show that following the proposed approach, attackers deploy more efficient attacks and achieve higher utility than adopting other baseline strategies.

List of keywords

Humans and AI -> HAI: Human-AI collaboration
Humans and AI -> HAI: Applications
Humans and AI -> HAI: Human-computer interaction

580

Robust Steganography without Embedding based on Secure Container Synthesis and Iterative Message Recovery

Ziping Ma, Yuesheng Zhu, Guibo Luo, Xiyao Liu, Gerald Schaefer, Hui Fang

[+] More

[-] Less

Synthesis-based steganography without embedding (SWE) methods transform secret messages to container images synthesised by generative networks, which avoids the distortions of container images and thus can fundamentally resist typical steganalysis tools. However, existing methods suffer from weak message recovery robustness, synthesis fidelity, and the risk of message leakage. To solve these problems, we propose a novel robust steganography without embedding method in this paper. In our method, we design a secure weight-modulation-based generator by introducing secure factors to hide secret messages in synthesised container images. In this manner, the synthesised results are modulated by secure factors and thus the secret messages are inaccessible when using fake factors, which reduces the risk of message leakage. Furthermore, we design a difference predictor via the reconstruction of tampered container images together with an adversarial training strategy to iteratively update the estimation of hidden messages. In this manner, the robustness of recovering hidden messages is ensured, and the degradation of synthesis fidelity is reduced since the generator is not included in the adversarial training. Extensive experimental results have demonstrated that our designed method is effective to avoid message leakage and superior to other existing methods in terms of recovery robustness and synthesis fidelity.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Security and privacy
Computer Vision -> CV: Applications
Computer Vision -> CV: Neural generative models, auto encoders, GANs

586

PasCore: A Chinese Overlapping Relation Extraction Model Based on Global Pointer Annotation Strategy

Peng Wang, Jiafeng Xie, Xiye Chen, Guozheng Li, Wei Li

[+] More

[-] Less

Recent work for extracting relations from texts has achieved excellent performance. However, existing studies mainly focus on simple relation extraction, these methods perform not well on overlapping triple problem because the tags of shared entities would conflict with each other. Especially, overlapping entities are common and indispensable in Chinese. To address this issue, this paper proposes PasCore, which utilizes a global pointer annotation strategy for overlapping relation extraction in Chinese. PasCore first obtains the sentence vector via general pre-training model encoder, and uses classifier to predicate relations. Subsequently, it uses global pointer annotation strategy for head entity annotation, which uses global tags to label the start and end positions of the entities. Finally, PasCore integrates the relation, head entity and its type to mark the tail entity. Furthermore, PasCore performs conditional layer normalization to fuse features, which connects all stages and greatly enriches the association between relations and entities. Experimental results on both Chinese and English real-world datasets demonstrate that PasCore outperforms strong baselines on relation extraction and, especially, shows superior performance on overlapping relation extraction.

List of keywords

Natural Language Processing -> NLP: Information extraction
Data Mining -> DM: Knowledge graphs and knowledge base completion

596

Deep Unfolding Convolutional Dictionary Model for Multi-Contrast MRI Super-resolution and Reconstruction

Pengcheng Lei, Faming Fang, Guixu Zhang, Ming Xu

[+] More

[-] Less

Magnetic resonance imaging (MRI) tasks often involve multiple contrasts. Recently, numerous deep learning-based multi-contrast MRI super-resolution (SR) and reconstruction methods have been proposed to explore the complementary information from the multi-contrast images. However, these methods either construct parameter-sharing networks or manually design fusion rules, failing to accurately model the correlations between multi-contrast images and lacking certain interpretations. In this paper, we propose a multi-contrast convolutional dictionary (MC-CDic) model under the guidance of the optimization algorithm with a well-designed data fidelity term. Specifically, we bulid an observation model for the multi-contrast MR images to explicitly model the multi-contrast images as common features and unique features. In this way, only the useful information in the reference image can be transferred to the target image, while the inconsistent information will be ignored. We employ the proximal gradient algorithm to optimize the model and unroll the iterative steps into a deep CDic model. Especially, the proximal operators are replaced by learnable ResNet. In addition, multi-scale dictionaries are introduced to further improve the model performance. We test our MC-CDic model on multi-contrast MRI SR and reconstruction tasks. Experimental results demonstrate the superior performance of the proposed MC-CDic model against existing SOTA methods. Code is available at https://github.com/lpcccc-cv/MC-CDic.

List of keywords

Computer Vision -> CV: Biomedical image analysis

604

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

Peng Jin, Hao Li, Zesen Cheng, Jinfa Huang, Zhennan Wang, Li Yuan, Chang Liu, Jie Chen

[+] More

[-] Less

Text-video retrieval is a challenging cross-modal task, which aims to align visual entities with natural language descriptions. Current methods either fail to leverage the local details or are computationally expensive. What’s worse, they fail to leverage the heterogeneous concepts in data. In this paper, we propose the Disentangled Conceptualization and Set-to-set Alignment (DiCoSA) to simulate the conceptualizing and reasoning process of human beings. For disentangled conceptualization, we divide the coarse feature into multiple latent factors related to semantic concepts. For set-to-set alignment, where a set of visual concepts correspond to a set of textual concepts, we propose an adaptive pooling method to aggregate semantic concepts to address the partial matching. In particular, since we encode concepts independently in only a few dimensions, DiCoSA is superior at efficiency and granularity, ensuring fine-grained interactions using a similar computational complexity as coarse-grained alignment. Extensive experiments on five datasets, including MSR-VTT, LSMDC, MSVD, ActivityNet, and DiDeMo, demonstrate that our method outperforms the existing state-of-the-art methods.

List of keywords

Computer Vision -> CV: Image and video retrieval
Computer Vision -> CV: Video analysis and understanding
Computer Vision -> CV: Vision and language

607

Contact2Grasp: 3D Grasp Synthesis via Hand-Object Contact Constraint

Haoming Li, Xinzhuo Lin, Yang Zhou, Xiang Li, Yuchi Huo, Jiming Chen, Qi Ye

[+] More

[-] Less

3D grasp synthesis generates grasping poses given an input object. Existing works tackle the problem by learning a direct mapping from objects to the distributions of grasping poses. However, because the physical contact is sensitive to small changes in pose, the high-nonlinear mapping between 3D object representation to valid poses is considerably non-smooth, leading to poor generation efficiency and restricted generality. To tackle the challenge, we introduce an intermediate variable for grasp contact areas to constrain the grasp generation; in other words, we factorize the mapping into two sequential stages by assuming that grasping poses are fully constrained given contact maps: 1) we first learn contact map distributions to generate the potential contact maps for grasps; 2) then learn a mapping from the contact maps to the grasping poses. Further, we propose a penetration-aware optimization with the generated contacts as a consistency constraint for grasp refinement. Extensive validations on two public datasets show that our method outperforms state-of-the-art methods regarding grasp generation on various metrics.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications

619

Controlling Neural Style Transfer with Deep Reinforcement Learning

Chengming Feng, Jing Hu, Xin Wang, Shu Hu, Bin Zhu, Xi Wu, Hongtu Zhu, Siwei Lyu

[+] More

[-] Less

Controlling the degree of stylization in the Neural Style Transfer (NST) is a little tricky since it usually needs hand-engineering on hyper-parameters. In this paper, we propose the first deep Reinforcement Learning (RL) based architecture that splits one-step style transfer into a step-wise process for the NST task. Our RL-based method tends to preserve more details and structures of the content image in early steps, and synthesize more style patterns in later steps. It is a user-easily-controlled style-transfer method. Additionally, as our RL-based model performs the stylization progressively, it is lightweight and has lower computational complexity than existing one-step Deep Learning (DL) based models. Experimental results demonstrate the effectiveness and robustness of our method.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Applications
Computer Vision -> CV: Applications

621

Compositional Zero-Shot Artistic Font Synthesis

Xiang Li, Lei Wu, Changshuo Wang, Lei Meng, Xiangxu Meng

[+] More

[-] Less

Recently, many researchers have made remarkable achievements in the field of artistic font synthesis, with impressive glyph style and effect style in the results. However, due to less exploration in style disentanglement, it is difficult for existing methods to envision a kind of unseen style (glyph-effect) compositions of artistic font, and thus can only learn the seen style compositions. To solve this problem, we propose a novel compositional zero-shot artistic font synthesis gan (CAFS-GAN), which allows the synthesis of unseen style compositions by exploring the visual independence and joint compatibility of encoding semantics between glyph and effect. Specifically, we propose two contrast-based style encoders to achieve style disentanglement due to glyph and effect intertwining in the image. Meanwhile, to preserve more glyph and effect detail, we propose a generator based on hierarchical dual styles AdaIN to reorganize content-styles representations from structure to texture gradually. Extensive experiments demonstrate the superiority of our model in generating high-quality artistic font images with unseen style compositions against other state-of-the-art methods. The source code and data is available at moonlight03.github.io/CAFS-GAN/.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Neural generative models, auto encoders, GANs

639

Discrepancy-Guided Reconstruction Learning for Image Forgery Detection

Zenan Shi, Haipeng Chen, Long Chen, Dong Zhang

[+] More

[-] Less

In this paper, we propose a novel image forgery detection paradigm for boosting the model learning capacity on both forgery-sensitive and genuine compact visual patterns. Compared to the existing methods that only focus on the discrepant-specific patterns (\eg, noises, textures, and frequencies), our method has a greater generalization. Specifically, we first propose a Discrepancy-Guided Encoder (DisGE) to extract forgery-sensitive visual patterns. DisGE consists of two branches, where the mainstream backbone branch is used to extract general semantic features, and the accessorial discrepant external attention branch is used to extract explicit forgery cues. Besides, a Double-Head Reconstruction (DouHR) module is proposed to enhance genuine compact visual patterns in different granular spaces. Under DouHR, we further introduce a Discrepancy-Aggregation Detector (DisAD) to aggregate these genuine compact visual patterns, such that the forgery detection capability on unknown patterns can be improved. Extensive experimental results on four challenging datasets validate the effectiveness of our proposed method against state-of-the-art competitors.

List of keywords

Computer Vision -> CV: Biometrics, face, gesture and pose recognition
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning

644

Dichotomous Image Segmentation with Frequency Prior Knowledge

Yan Zhou, Bo Dong, Yuanfeng Wu, Wentao Zhu, Geng Chen, Yanning Zhang

[+] More

[-] Less

Dichotomous image segmentation (DIS) has a wide range of real-world applications and gained increasing research attention in recent years. In this paper, we propose to tackle DIS with informative Frequency Prior Knowledge (FPK). Our model, called FPK-DIS, stems from the fact that prior knowledge in the frequency domain can provide valuable cues to identify fine-grained object boundaries. Specifically, we propose a frequency priors generator to jointly utilize fixed filters and learnable filters to extract informative FPK. Before embedding the FPK into network, we first harmonize the multi-scale side-out features to reduce their heterogeneity. This is achieved by our feature harmonization module, which is based on a gating mechanism to harmonize the grouped features. Finally, we propose a frequency priors embedding module to embed the FPK into multi-scale features through an adaptive modulation strategy. Extensive experiments on the benchmark dataset, DIS5K, demonstrate that our FPK-DIS outperforms 17 state-ofthe-art methods by a large margin in terms of key evaluation metrics.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Scene analysis and understanding
Robotics -> ROB: Perception

648

HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning

Xinting Liao, Weiming Liu, Chaochao Chen, Pengyang Zhou, Huabin Zhu, Yanchao Tan, Jun Wang, Yue Qi

[+] More

[-] Less

Federated learning (FL) collaboratively models user data in a decentralized way. However, in the real world, non-identical and independent data distributions (non-IID) among clients hinder the performance of FL due to three issues, i.e., (1) the class statistics shifting, (2) the insufficient hierarchical information utilization, and (3) the inconsistency in aggregating clients. To address the above issues, we propose HyperFed which contains three main modules, i.e., hyperbolic prototype Tammes initialization (HPTI), hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly, HPTI in the server constructs uniformly distributed and fixed class prototypes, and shares them with clients to match class statistics, further guiding consistent feature representation for local clients. Secondly, HPL in each client captures the hierarchical information in local data with the supervision of shared class prototypes in the hyperbolic model space. Additionally, CA in the server mitigates the impact of the inconsistent deviations from clients to server. Extensive studies of four datasets prove that HyperFed is effective in enhancing the performance of FL under the non-IID setting.

List of keywords

Machine Learning -> ML: Federated learning
Computer Vision -> CV: Representation learning

656

Learning 3D Photography Videos via Self-supervised Diffusion on Single Images

Xiaodong Wang, Chenfei Wu, Shengming Yin, Minheng Ni, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

[+] More

[-] Less

3D photography renders a static image into a video with appealing 3D visual effects. Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints, and finally use an inpainting model to fill those missing/occluded regions. The inpainting model plays a crucial role in rendering quality, but it is normally trained on out-of-domain data. To reduce the training and inference gap, we propose a novel self-supervised diffusion model as the inpainting module. Given a single input image, we automatically construct a training pair of the masked occluded image and the ground-truth image with random cycle-rendering. The constructed training samples are closely aligned to the testing instances, without the need of data annotation. To make full use of the masked images, we design a Masked Enhanced Block (MEB), which can be easily plugged into the UNet and enhance the semantic conditions. Towards real-world animation, we present a novel task: out-animation, which extends the space and time of input objects. Extensive experiments on real datasets show that our method achieves competitive results with existing SOTA methods.

List of keywords

Computer Vision -> CV: Neural generative models, auto encoders, GANs
Computer Vision -> CV: Vision and language

658

Align, Perturb and Decouple: Toward Better Leverage of Difference Information for RSI Change Detection

Supeng Wang, Yuxi Li, Ming Xie, mingmin Chi, Yabiao Wang, Chengjie Wang, wenbing zhu

[+] More

[-] Less

Change detection is a widely adopted technique in remote sense imagery (RSI) analysis to discover long-term geomorphic evolution. To highlight the areas of semantic changes, previous effort mostly pays attention to learning representative feature descriptors of single image, while the difference information is either modeled with a simple difference operations or implicitly embeded in feature interactions. Nevertheless, such difference modeling can be noisy since it suffers from non-semantic changes and lacks explicit guidance from image content or context. In this paper, we revisit the importance of feature difference for change detection in RSI, and propose series of operations to fully exploit the difference information: Alignment, Perturbation and Decoupling (APD). Firstly, alignment leverages the contextual similarity to compensate for non-semantic difference in feature space. Next, a difference module trained with semantic-wise perturbation is adopted to learn more generalized change estimators, which reversely bootstraps feature extraction and prediction. Finally, a decoupled dual-decoder structure is designed to predict semantic changes in both content-aware and content-agnostic manners. Extensive experiments are conducted on benchmarks of LEVIR-CD, WHU-CD and DSIFN-CD, demonstrating our proposed operations brings significant improvement and achieves competitive results under the same conditions.

List of keywords

Computer Vision -> CV: Scene analysis and understanding

660

Semi-supervised Domain Adaptation via Prototype-based Multi-level Learning

Xinyang Huang, Chuang Zhu, Wenkai Chen

[+] More

[-] Less

In semi-supervised domain adaptation (SSDA), a few labeled target samples of each class help the model to transfer knowledge representation from the fully labeled source domain to the target domain. Many existing methods ignore the benefits of making full use of the labeled target samples from multi-level. To make better use of this additional data, we propose a novel Prototype-based Multi-level Learning (ProML) framework to better tap the potential of labeled target samples. To achieve intra-domain adaptation, we first introduce a pseudo-label aggregation based on the intra-domain optimal transport to help the model align the feature distribution of unlabeled target samples and the prototype. At the inter-domain level, we propose a cross-domain alignment loss to help the model use the target prototype for cross-domain knowledge transfer. We further propose a dual consistency based on prototype similarity and linear classifier to promote discriminative learning of compact target feature representation at the batch level. Extensive experiments on three datasets, including DomainNet, VisDA2017, and Office-Home, demonstrate that our proposed method achieves state-of-the-art performance in SSDA. Our code is available at https://github.com/bupt-ai-cz/ProML.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Recognition (object detection, categorization)

663

Multi-level Graph Contrastive Prototypical Clustering

Yuchao Zhang, Yuan Yuan, Qi Wang

[+] More

[-] Less

Recently, graph neural networks (GNNs) have drawn a surge of investigations in deep graph clustering. Nevertheless, existing approaches predominantly are inclined to semantic-agnostic since GNNs exhibit inherent limitations in capturing global underlying semantic structures. Meanwhile, multiple objectives are imposed within one latent space, whereas representations from different granularities may presumably conflict with each other, yielding severe performance degradation for clustering. To this end, we propose a novel Multi-Level Graph Contrastive Prototypical Clustering (MLG-CPC) framework for end-to-end clustering. Specifically, a Prototype Discrimination (ProDisc) objective function is proposed to explicitly capture semantic information via cluster assignments. Moreover, to alleviate the issue of objectives conflict, we introduce to perceive representations of different granularities within individual feature-, prototypical-, and cluster-level spaces by the feature decorrelation, prototype contrast, and cluster space consistency respectively. Extensive experiments on four benchmarks demonstrate the superiority of the proposed MLG-CPC against the state-of-the-art graph clustering approaches.

List of keywords

Machine Learning -> ML: Unsupervised learning
Machine Learning -> ML: Clustering
Machine Learning -> ML: Multi-view learning

668

Universal Adaptive Data Augmentation

Xiaogang Xu, Hengshuang Zhao

[+] More

[-] Less

Existing automatic data augmentation (DA) methods either ignore updating DA’s parameters according to the target model’s state during training or adopt update strategies that are not effective enough. In this work, we design a novel data augmentation strategy called “Universal Adaptive Data Augmentation" (UADA). Different from existing methods, UADA would adaptively update DA’s parameters according to the target model’s gradient information during training: given a pre-defined set of DA operations, we randomly decide types and magnitudes of DA operations for every data batch during training, and adaptively update DA’s parameters along the gradient direction of the loss concerning DA’s parameters. In this way, UADA can increase the training loss of the target networks, and the target networks would learn features from harder samples to improve the generalization. Moreover, UADA is very general and can be utilized in numerous tasks, e.g., image classification, semantic segmentation and object detection. Extensive experiments with various models are conducted on CIFAR-10, CIFAR-100, ImageNet, tiny-ImageNet, Cityscapes, and VOC07+12 to prove the significant performance improvements brought by UADA.

List of keywords

Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Segmentation

669

Graph Propagation Transformer for Graph Representation Learning

Zhe Chen, Hao Tan, Tao Wang, Tianrun Shen, Tong Lu, Qiuying Peng, Cheng Cheng, Yue Qi

[+] More

[-] Less

This paper presents a novel transformer architecture for graph representation learning. The core insight of our method is to fully consider the information propagation among nodes and edges in a graph when building the attention module in the transformer blocks. Specifically, we propose a new attention mechanism called Graph Propagation Attention (GPA). It explicitly passes the information among nodes and edges in three ways, i.e. node-to-node, node-to-edge, and edge-to-node, which is essential for learning graph-structured data. On this basis, we design an effective transformer architecture named Graph Propagation Transformer (GPTrans) to further help learn graph data. We verify the performance of GPTrans in a wide range of graph learning experiments on several benchmark datasets. These results show that our method outperforms many state-of-the-art transformer-based graph models with better performance. The code and models will be made available.

List of keywords

Machine Learning -> ML: Applications
Machine Learning -> ML: Attention models
Machine Learning -> ML: Sequence and graph learning

680

First-Choice Maximality Meets Ex-ante and Ex-post Fairness

Xiaoxi Guo, Sujoy Sikdar, Lirong Xia, Yongzhi Cao, Hanpin Wang

[+] More

[-] Less

We design randomized mechanisms for assigning multiple indivisible items to a group of agents given their ordinal preferences. We show that our mechanisms output assignments that satisfy desirable combinations of efficiency and fairness properties both ex-ante and ex-post. The generalized eager Boston mechanism is ex-ante envy-free and ex-post envy-free up to one item (EF1). The generalized probabilistic Boston mechanism satisfies EF1 and an ex-ante guarantee of efficiency instead of fairness. Our mechanisms are also ex-post Pareto-efficient and first-choice maximal, i.e., they maximize the number of agents assigned their first choices. In doing so, we expand the frontiers of simultaneously providing efficiency and both ex-ante and ex-post fairness guarantees for the assignment problem.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Fair division

683

Non-Lambertian Multispectral Photometric Stereo via Spectral Reflectance Decomposition

Jipeng Lv, Heng Guo, Guanying CHEN, Jinxiu Liang, Boxin Shi

[+] More

[-] Less

Multispectral photometric stereo (MPS) aims at recovering the surface normal of a scene from a single-shot multispectral image captured under multispectral illuminations. Existing MPS methods adopt the Lambertian reflectance model to make the problem tractable, but it greatly limits their application to real-world surfaces. In this paper, we propose a deep neural network named NeuralMPS to solve the MPS problem under non-Lambertian spectral reflectances. Specifically, we present a spectral reflectance decomposition model to disentangle the spectral reflectance into a geometric component and a spectral component. With this decomposition, we show that the MPS problem for surfaces with a uniform material is equivalent to the conventional photometric stereo (CPS) with unknown light intensities. In this way, NeuralMPS reduces the difficulty of the non-Lambertian MPS problem by leveraging the well-studied non-Lambertian CPS methods. Experiments on both synthetic and real-world scenes demonstrate the effectiveness of our method.

List of keywords

Computer Vision -> CV: Computational photography
Computer Vision -> CV: 3D computer vision

684

Feature Staleness Aware Incremental Learning for CTR Prediction

Zhikai Wang, Yanyan Shen, Zibin Zhang, Kangyi Lin

[+] More

[-] Less

Click-through Rate (CTR) prediction in real-world recommender systems often deals with billions of user interactions every day. To improve the training efficiency, it is common to update the CTR prediction model incrementally using the new incremental data and a subset of historical data. However, the feature embeddings of a CTR prediction model often get stale when the corresponding features do not appear in current incremental data. In the next period, the model would have a performance degradation on samples containing stale features, which we call the feature staleness problem. To mitigate this problem, we propose a Feature Staleness Aware Incremental Learning method for CTR prediction (FeSAIL) which adaptively replays samples containing stale features. We first introduce a staleness-aware sampling algorithm (SAS) to sample a fixed number of stale samples with high sampling efficiency. We then introduce a staleness-aware regularization mechanism (SAR) for a fine-grained control of the feature embedding updating. We instantiate FeSAIL with a general deep learning-based CTR prediction model and the experimental results demonstrate FeSAIL outperforms various state-of-the-art methods on four benchmark datasets. The code can be found in https://github.com/cloudcatcher888/FeSAIL.

List of keywords

Data Mining -> DM: Recommender systems
Machine Learning -> ML: Incremental learning

694

CADParser: A Learning Approach of Sequence Modeling for B-Rep CAD

Shengdi Zhou, Bin Zhou, Tianyi Tang

[+] More

[-] Less

Computer-Aided Design (CAD) plays an essential role in industrial manufacturing. An entire manufactured object always contains geometry information and the construction workflow. With the construction information, a parametric CAD model can be re-edited effectively. Unlike the mesh or point cloud, boundary representation (B-Rep) is commercially a standard format for the geometry structure. Since there are no uniform criteria to store the construction workflow, JSON format is an alternative. Unfortunately, most manufactured CAD models on the Internet only provide geometry information without the construction procedure reducing the efficiency of the creation. This paper proposes a learning approach to infer the underlying modeling sequences given a B-Rep CAD model by treating the CAD geometry structure as a graph and the construction workflow as a sequence. Since the existing CAD dataset only contains two operations (i.e., Sketch and Extrusion), limiting the diversity of the CAD model creation, we introduce a large-scale dataset with diverse operations (e.g., Revolution, Fillet, Chamfer). Each model includes both the geometry structure and the construction sequences. Extensive experiments demonstrate that our method outperforms the existing state-of-the-art methods quantitatively and qualitatively.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications

704

FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning

Yuanyuan Chen, Zichen Chen, Pengcheng Wu, Han Yu

[+] More

[-] Less

Large-scale neural networks possess considerable expressive power. They are well-suited for complex learning tasks in industrial applications. However, large-scale models pose significant challenges for training under the current Federated Learning (FL) paradigm. Existing approaches for efficient FL training often leverage model parameter dropout. However, manipulating individual model parameters is not only inefficient in meaningfully reducing the communication overhead when training large-scale FL models, but may also be detrimental to the scaling efforts and model performance as shown by recent research. To address these issues, we propose the Federated Opportunistic Block Dropout (FedOBD) approach. The key novelty is that it decomposes large-scale models into semantic blocks so that FL participants can opportunistically upload quantized blocks, which are deemed to be significant towards training the model, to the FL server for aggregation. Extensive experiments evaluating FedOBD against four state-of-the-art approaches based on multiple real-world datasets show that it reduces the overall communication overhead by more than 88% compared to the best performing baseline approach, while achieving the highest test accuracy. To the best of our knowledge, FedOBD is the first approach to perform dropout on FL models at the block level rather than at the individual parameter level.

List of keywords

Machine Learning -> ML: Federated learning
Machine Learning -> ML: Learning sparse models
Machine Learning -> ML: Optimization

705

Some Might Say All You Need Is Sum

Eran Rosenbluth, Jan Tönshoff, Martin Grohe

[+] More

[-] Less

The expressivity of Graph Neural Networks (GNNs) is dependent on the aggregation functions they employ. Theoretical works have pointed towards Sum aggregation GNNs subsuming every other GNNs, while certain practical works have observed a clear advantage to using Mean and Max. An examination of the theoretical guarantee identifies two caveats. First, it is size-restricted, that is, the power of every specific GNN is limited to graphs of a certain maximal size. Successfully processing larger graphs may require an other GNN, and so on. Second, it concerns the power to distinguish non-isomorphic graphs, not the power to approximate general functions on graphs, and the former does not necessarily imply the latter. It is important that a GNN’s usability will not be limited to graphs of any certain maximal size. Therefore, we explore the realm of unrestricted-size expressivity. We prove that simple functions, which can be computed exactly by Mean or Max GNNs, are inapproximable by any Sum GNN. We prove that under certain restrictions, every Mean or Max GNNs can be approximated by a Sum GNN, but even there, a combination of (Sum, [Mean/Max]) is more expressive than Sum alone. Lastly, we prove further expressivity limitations of Sum-GNNs.

List of keywords

Machine Learning -> ML: Theory of deep learning
Machine Learning -> ML: Learning theory
Machine Learning -> ML: Sequence and graph learning

721

Complexity of Efficient Outcomes in Binary-Action Polymatrix Games and Implications for Coordination Problems

Argyrios Deligkas, Gregory Gutin, Eduard Eiben, Philip Neary, Anders Yeo

[+] More

[-] Less

We investigate the difficulty of finding economically efficient solutions to coordination problems on graphs. Our work focuses on two forms of coordination: games of strategic complements (pure-coordination) and games of strategic substitutes (anti-coordination). We consider three objectives in the context of simple binary-action polymatrix games: (a) maximizing welfare, (b) maximizing potential, and (c) finding a welfare-maximizing Nash equilibrium. We introduce an intermediate, new graph-partition problem, termed Maximum Weighted Digraph Partition, which is of independent interest, and we provide a dichotomy for it. This dichotomy, among other results, provides as a corollary a dichotomy for objective (a) for general binary-action polymatrix games. In addition, it reveals that the complexity of achieving these objectives varies depending on the form of the coordination problem. Specifically, objectives (a) and (b) can be efficiently solved in pure-coordination games, but are NP-hard in anti-coordination games. Finally, we show that objective (c) is NP-hard even for the simplest non-trivial pure-coordination games.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation

728

Dynamic Belief for Decentralized Multi-Agent Cooperative Learning

Yunpeng Zhai, Peixi Peng, Chen Su, Yonghong Tian

[+] More

[-] Less

Decentralized multi-agent cooperative learning is a practical task due to the partially observed setting both in training and execution. Every agent learns to cooperate without access to the observations and policies of others. However, the decentralized training of multi-agent is of great difficulty due to non-stationarity, especially when other agents’ policies are also in learning during training. To overcome this, we propose to learn a dynamic policy belief for each agent to predict the current policies of other agents and accordingly condition the policy of its own. To quickly adapt to the development of others’ policies, we introduce a historical context to learn the belief inference according to a few recent action histories of other agents and a latent variational inference to model their policies by a learned distribution. We evaluate our method on the StarCraft II micro management task (SMAC) and demonstrate its superior performance in the decentralized training settings and comparable results with the state-of-the-art CTDE methods.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation

729

The #DNN-Verification Problem: Counting Unsafe Inputs for Deep Neural Networks

Luca Marzari, Davide Corsi, Ferdinando Cicalese, Alessandro Farinelli

[+] More

[-] Less

Deep Neural Networks are increasingly adopted in critical tasks that require a high level of safety, e.g., autonomous driving. While state-of-the-art verifiers can be employed to check whether a DNN is unsafe w.r.t. some given property (i.e., whether there is at least one unsafe input configuration), their yes/no output is not informative enough for other purposes, such as shielding, model selection, or training improvements. In this paper, we introduce the #DNN-Verification problem, which involves counting the number of input configurations of a DNN that result in a violation of a particular safety property. We analyze the complexity of this problem and propose a novel approach that returns the exact count of violations. Due to the #P-completeness of the problem, we also propose a randomized, approximate method that provides a provable probabilistic bound of the correct count while significantly reducing computational requirements. We present experimental results on a set of safety-critical benchmarks that demonstrate the effectiveness of our approximate method and evaluate the tightness of the bound.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
AI Ethics, Trust, Fairness -> ETF: Safety and robustness

731

Generalization Guarantees of Self-Training of Halfspaces under Label Noise Corruption

Lies Hadjadj, Massih-Reza Amini, Sana Louhichi

[+] More

[-] Less

We investigate the generalization properties of a self-training algorithm with halfspaces. The approach learns a list of halfspaces iteratively from labeled and unlabeled training data, in which each iteration consists of two steps: exploration and pruning. In the exploration phase, the halfspace is found sequentially by maximizing the unsigned-margin among unlabeled examples and then assigning pseudo-labels to those that have a distance higher than the current threshold. These pseudo-labels are allegedly corrupted by noise. The training set is then augmented with noisy pseudo-labeled examples, and a new classifier is trained. This process is repeated until no more unlabeled examples remain for pseudo-labeling. In the pruning phase, pseudo-labeled samples that have a distance to the last halfspace greater than the associated unsigned-margin are then discarded. We prove that the misclassification error of the resulting sequence of classifiers is bounded and show that the resulting semi-supervised approach never degrades performance compared to the classifier learned using only the initial labeled training set. Experiments carried out on a variety of benchmarks demonstrate the efficiency of the proposed approach compared to state-of-the-art methods.

List of keywords

Machine Learning -> ML: Learning theory
Machine Learning -> ML: Semi-supervised learning

736

Sequential Recommendation with Probabilistic Logical Reasoning

Huanhuan Yuan, Pengpeng Zhao, Xuefeng Xian, Guanfeng Liu, Yanchi Liu, Victor S. Sheng, Lei Zhao

[+] More

[-] Less

Deep learning and symbolic learning are two frequently employed methods in Sequential Recommendation (SR). Recent neural-symbolic SR models demonstrate their potential to enable SR to be equipped with concurrent perception and cognition capacities. However, neural-symbolic SR remains a challenging problem due to open issues like representing users and items in logical reasoning. In this paper, we combine the Deep Neural Network (DNN) SR models with logical reasoning and propose a general framework named Sequential Recommendation with Probabilistic Logical Reasoning (short for SR-PLR). This framework allows SR-PLR to benefit from both similarity matching and logical reasoning by disentangling feature embedding and logic embedding in the DNN and probabilistic logic network. To better capture the uncertainty and evolution of user tastes, SR-PLR embeds users and items with a probabilistic method and conducts probabilistic logical reasoning on users’ interaction patterns. Then the feature and logic representations learned from the DNN and logic network are concatenated to make the prediction. Finally, experiments on various sequential recommendation models demonstrate the effectiveness of the SR-PLR. Our code is available at https://github.com/Huanhuaneryuan/SR-PLR.

List of keywords

Data Mining -> DM: Recommender systems
Data Mining -> DM: Collaborative filtering

747

Learning Few-shot Sample-set Operations for Noisy Multi-label Aspect Category Detection

Shiman Zhao, Wei Chen, Tengjiao Wang

[+] More

[-] Less

Multi-label Aspect Category Detection (MACD) is essential for aspect-based sentiment analysis, which aims to identify multiple aspect categories in a given sentence. Few-shot MACD is critical due to the scarcity of labeled data. However, MACD is a high-noise task, and existing methods fail to address it with only two or three training samples per class, which limits the application in practice. To solve above issues, we propose a group of Few-shot Sample-set Operations (FSO) to solve noisy MACD in fewer sample scenarios by identifying the semantic contents of samples. Learning interactions among intersection, subtraction, and union networks, the FSO imitates arithmetic operations on samples to distinguish relevant and irrelevant aspect contents. Eliminating the negative effect caused by noises, the FSO extracts discriminative prototypes and customizes a dedicated query vector for each class. Besides, we design a multi-label architecture, which integrates with score-wise loss and multi-label loss to optimize the FSO for multi-label prediction, avoiding complex threshold training or selection. Experiments show that our method achieves considerable performance. Significantly, it improves by 11.01% at most and an average of 8.59% Macro-F in fewer sample scenarios.

List of keywords

Natural Language Processing -> NLP: Sentiment analysis, stylistic analysis, and argument mining
Machine Learning -> ML: Few-shot learning
Natural Language Processing -> NLP: Dialogue and interactive systems

752

Federated Probabilistic Preference Distribution Modelling with Compactness Co-Clustering for Privacy-Preserving Multi-Domain Recommendation

Weiming Liu, Chaochao Chen, Xinting Liao, Mengling Hu, Jianwei Yin, Yanchao Tan, Longfei Zheng

[+] More

[-] Less

With the development of modern internet techniques, Cross-Domain Recommendation (CDR) systems have been widely exploited for tackling the data-sparsity problem. Meanwhile most current CDR models assume that user-item interactions are accessible across different domains. However, such knowledge sharing process will break the privacy protection policy. In this paper, we focus on the Privacy-Preserving Multi-Domain Recommendation problem (PPMDR). The problem is challenging since different domains are sparse and heterogeneous with the privacy protection. To tackle the above issues, we propose Federated Probabilistic Preference Distribution Modelling (FPPDM). FPPDM includes two main components, i.e., local domain modelling component and global server aggregation component with federated learning strategy. The local domain modelling component aims to exploit user/item preference distributions using the rating information in the corresponding domain. The global server aggregation component is set to combine user characteristics across domains. To better extract semantic neighbors information among the users, we further provide compactness co-clustering strategy in FPPDM ++ to cluster the users with similar characteristics. Our empirical studies on benchmark datasets demonstrate that FPPDM/ FPPDM ++ significantly outperforms the state-of-the-art models.

List of keywords

Data Mining -> DM: Recommender systems

757

Mean Payoff Optimization for Systems of Periodic Service and Maintenance

David Klaska, Antonin Kucera, Vit Musil, Vojtech Rehak

[+] More

[-] Less

Consider oriented graph nodes requiring periodic visits by a service agent. The agent moves among the nodes and receives a payoff for each completed service task, depending on the time elapsed since the previous visit to a node. We consider the problem of finding a suitable schedule for the agent to maximize its long-run average payoff per time unit. We show that the problem of constructing an epsilon-optimal schedule is PSPACE-hard for every fixed non-negative epsilon, and that there exists an optimal periodic schedule of exponential length. We propose randomized finite-memory (RFM) schedules as a compact description of the agent’s strategies and design an efficient algorithm for constructing RFM schedules. Furthermore, we construct deterministic periodic schedules by sampling from RFM schedules.

List of keywords

Planning and Scheduling -> PS: Robot planning
Planning and Scheduling -> PS: Routing
Robotics -> ROB: Motion and path planning

774

RFENet: Towards Reciprocal Feature Evolution for Glass Segmentation

Ke Fan, Changan Wang, Yabiao Wang, Chengjie Wang, Ran Yi, Lizhuang Ma

[+] More

[-] Less

Glass-like objects are widespread in daily life but remain intractable to be segmented for most existing methods. The transparent property makes it difficult to be distinguished from background, while the tiny separation boundary further impedes the acquisition of their exact contour. In this paper, by revealing the key co-evolution demand of semantic and boundary learning, we propose a Multi-scale Selective Mutual (MSM) module to enable the reciprocal feature learning between them. Then to exploit the global shape context, we propose a Structurally Attentive Refinement (SAR) module to conduct a fine-grained feature refinement for those ambiguous points around the boundary. To further utilize the multi-scale information, we simply design a cascaded structure combined with the above two novel modules, and finally introduce Reciprocal Feature Evolution Network (RFENet) for effective glass-like object segmentation. Extensive experiments demonstrate that our RFENet achieves state-of-the-art performance on three popular public datasets.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Scene analysis and understanding

792

Optimal Seat Arrangement: What Are the Hard and Easy Cases?

Esra Ceylan, Jiehua Chen, Sanjukta Roy

[+] More

[-] Less

We study four NP-hard optimal seat arrangement problems [Bodlaender et al., 2020a] which each have as its input a set of n agents, where each agent has cardinal preferences over other agents, and an n-vertex undirected graph (called the seat graph). The task is to assign each agent to a distinct vertex in the seat graph such that either the sum of utilities or the minimum utility is maximized, or it is envy-free or exchange-stable. Aiming at identifying hard and easy cases, we extensively study the algorithmic complexity of the four problems by looking into natural graph classes for the seat graph (e.g., paths, cycles, stars, or matchings), problem-specific parameters (e.g., the number of non-isolated vertices in the seat graph or the maximum number of agents towards whom an agent has non-zero preferences), and preference structures (e.g., non-negative or symmetric preferences). For strict preferences and seat graphs with disjoint edges and isolated vertices, we correct an error by Bodlaender et al. [2020b] and show that finding an envy-free arrangement remains NP-hard in this case.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
Game Theory and Economic Paradigms -> GTEP: Cooperative games
Game Theory and Economic Paradigms -> GTEP: Fair division

794

Realistic Cell Type Annotation and Discovery for Single-cell RNA-seq Data

Yuyao Zhai, Liang Chen, Minghua Deng

[+] More

[-] Less

The rapid development of single-cell RNA sequencing (scRNA-seq) technologies allows us to explore tissue heterogeneity at the cellular level. Cell type annotation plays an essential role in the substantial downstream analysis of scRNA-seq data. Existing methods usually classify the novel cell types in target data as an “unassigned” group and rarely discover the fine-grained cell type structure among them. Besides, these methods carry risks, such as susceptibility to batch effect between reference and target data, thus further compromising of inherent discrimination of target data. Considering these limitations, here we propose a new and practical task called realistic cell type annotation and discovery for scRNA-seq data. In this task, cells from seen cell types are given class labels, while cells from novel cell types are given cluster labels. To tackle this problem, we propose an end-to-end algorithm framework called scPOT from the perspective of optimal transport (OT). Specifically, we first design an OT-based prototypical representation learning paradigm to encourage both global discriminations of clusters and local consistency of cells to uncover the intrinsic structure of target data. Then we propose an unbalanced OT-based partial alignment strategy with statistical filling to detect the cells from the seen cell types across reference and target data. Notably, scPOT also introduces an easy yet effective solution to automatically estimate the overall cell type number in target data. Extensive results on our carefully designed evaluation benchmarks demonstrate the superiority of scPOT over various state-of-the-art clustering and annotation methods.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Bioinformatics
Machine Learning -> ML: Applications

795

FGNet: Towards Filling the Intra-class and Inter-class Gaps for Few-shot Segmentation

Yuxuan Zhang, Wei Yang, Shaowei Wang

[+] More

[-] Less

Current few-shot segmentation (FSS) approaches have made tremendous achievements based on prototypical learning techniques. However, due to the scarcity of the support data provided, FSS methods still suffer from the intra-class and inter-class gaps. In this paper, we propose a uniform network to fill both the gaps, termed FGNet. It consists of the novel design of a Self-Adaptive Module (SAM) to emphasize the query feature to generate an enhanced prototype for self-alignment. Such a prototype caters to each query sample itself since it contains the underlying intra-instance information, which gets around the intra-class appearance gap. Moreover, we design an Inter-class Feature Separation Module (IFSM) to separate the feature space of the target class from other classes, which contributes to bridging the inter-class gap. In addition, we present several new losses and a method termed B-SLIC, which help to further enhance the separation performance of FGNet. Experimental results show that FGNet reduces both the gaps for FSS by SAM and IFSM respectively, and achieves state-of-the-art performances on both PASCAL-5i and COCO-20i datasets compared with previous top-performing approaches.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

797

Leveraging Argumentation for Generating Robust Sample-based Explanations

Leila Amgoud, Philippe Muller, Henri Trenquier

[+] More

[-] Less

Explaining predictions made by inductive classifiers has become crucial with the rise of complex models acting more and more as black-boxes. Abductive explanations are one of the most popular types of explanations that are provided for the purpose. They highlight feature-values that are sufficient for making predictions. In the literature, they are generated by exploring the whole feature space, which is unreasonable in practice. This paper solves the problem by introducing explanation functions that generate abductive explanations from a sample of instances. It shows that such functions should be defined with great care since they cannot satisfy two desirable properties at the same time, namely existence of explanations for every individual decision (success) and correctness of explanations (coherence). The paper provides a parameterized family of argumentation-based explanation functions, each of which satisfies one of the two properties. It studies their formal properties and their experimental behaviour on different datasets.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability

800

Parametrized Gradual Semantics Dealing with Varied Degrees of Compensation

Dragan Doder, Leila Amgoud, Srdjan Vesic

[+] More

[-] Less

Compensation is a strategy that a semantics may follow when it faces dilemmas between quality and quantity of attackers. It allows several weak attacks to compensate one strong attack. It is thus based on \textit{compensation degree}, which is a pair of two parameters: i) a parameter showing to what extent an attack is weak, and ii) a parameter indicating the number of weak attackers needed to compensate a strong one. Existing principles on compensation do not specify the parameters, thus it is unclear whether semantics satisfying them compensate at only one degree or several degrees, and which ones. This paper proposes a parameterised family of gradual semantics that is based on a parameter $\alpha$ taking values from the interval $(0,+\infty)$, each of which leads to a different semantics. The family unifies multiple semantics that share some principles but differ in their strategy regarding solving dilemmas. Indeed, we show that the two semantics taking the extreme values of $\alpha$ favor respectively quantity and quality, while all the remaining ones compensate at any degree. We define three classes of compensation degrees and show that the novel family is able to compensate at any of them while none of the existing gradual semantics does.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation

809

Solving Quantum-Inspired Perfect Matching Problems via Tutte’s Theorem-Based Hybrid Boolean Constraints

Moshe Vardi, Zhiwei Zhang

[+] More

[-] Less

Determining the satisfiability of Boolean constraint-satisfaction problems with different types of constraints, that is hybrid constraints, is a well-studied problem with important applications. We study here a new application of hybrid Boolean constraints, which arises in quantum computing. The problem relates to constrained perfect matching in edge-colored graphs. While general-purpose hybrid constraint solvers can be powerful, we show that direct encodings of the constrained-matching problem as hybrid constraints scale poorly and special techniques are still needed. We propose a novel encoding based on Tutte’s Theorem in graph theory as well as optimization techniques. Empirical results demonstrate that our encoding, in suitable languages with advanced SAT solvers, scales significantly better than a number of competing approaches on constrained-matching benchmarks. Our study identifies the necessity of designing problem-specific encodings when applying powerful general-purpose constraint solvers.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Modeling

827

Learning Monocular Depth in Dynamic Environment via Context-aware Temporal Attention

zizhang wu, Zhuozheng Li, Zhi-Gang Fan, Yunzhe Wu, Yuanzhu Gan, Jian Pu

[+] More

[-] Less

The monocular depth estimation task has recently revealed encouraging prospects, especially for the autonomous driving task. To tackle the ill-posed problem of 3D geometric reasoning from 2D monocular images, multi-frame monocular methods are developed to leverage the perspective correlation information from sequential temporal frames. However, moving objects like cars and trains usually violate the static scene assumption, leading to feature inconsistency deviation and misaligned cost values, which would mislead the optimization algorithm. In this work, we present CTA-Depth, a Context-aware Temporal Attention guided network for multi-frame monocular Depth estimation. Specifically, we first apply a multi-level attention enhancement module to integrate multi-level image features for obtaining an initial depth and pose estimation. Then the proposed CTA-Refiner is adopted to optimize the depth and pose iteratively. During the CTA-Refiner process, context-aware temporal attention (CTA) is developed to capture the global temporal-context corelations for keeping the feature consistency and estimation integrity of moving objects. Particularly, we propose the long-range geometry embedding (LGE) module to produce a long-range temporal geometry prior. Our approach achieves significant improvements (e.g., 13.5% for Abs Rel on the KITTI dataset) over state-of-the-art approaches on three benchmark datasets. We will release our code for implementation after paper acceptance.

List of keywords

Computer Vision -> CV: 3D computer vision

830

InitLight: Initial Model Generation for Traffic Signal Control Using Adversarial Inverse Reinforcement Learning

Yutong Ye, Yingbo Zhou, Jiepin Ding, Ting Wang, Mingsong Chen, Xiang Lian

[+] More

[-] Less

Although Reinforcement Learning (RL) has been extensively studied for Traffic Signal Control (TSC), it still suffers from high learning costs. This is because the trial-and-error attempts during the learning process of RL agents result in poor performance at the beginning and slow convergence to an optimal solution. To address this issue, this paper proposes a novel Adversarial Inverse Reinforcement Learning (AIRL)-based approach, named InitLight, which can generate an effective initial model to improve the jump-start performance for multi-intersection TSC. To be concrete, we design an adversarial architecture to pre-train the RL model from expert trajectories by the learned reward function and transfer the trained initial model to a multi-intersection environment. Based on our proposed pre-training method, the generalizability and robustness of the initial model can be significantly improved. Comprehensive experimental results obtained from various well-known traffic benchmarks show that, compared with the state-of-the-art RL-based TSC methods, InitLight can not only converge faster to a competitive result, but also achieve near-optimal performance after the first episode and be robust to various traffic scenarios.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Transportation
Machine Learning -> ML: Deep reinforcement learning
Multidisciplinary Topics and Applications -> MDA: Sensor networks and smart cities

847

Efficient Multi-View Inverse Rendering Using a Hybrid Differentiable Rendering Method

Xiangyang Zhu, Yiling Pan, Bailin Deng, Bin Wang

[+] More

[-] Less

Recovering the shape and appearance of real-world objects from natural 2D images is a long-standing and challenging inverse rendering problem. In this paper, we introduce a novel hybrid differentiable rendering method to efficiently reconstruct the 3D geometry and reflectance of a scene from multi-view images captured by conventional hand-held cameras. Our method follows an analysis-by-synthesis approach and consists of two phases. In the initialization phase, we use traditional SfM and MVS methods to reconstruct a virtual scene roughly matching the real scene. Then in the optimization phase, we adopt a hybrid approach to refine the geometry and reflectance, where the geometry is first optimized using an approximate differentiable rendering method, and the reflectance is optimized afterward using a physically-based differentiable rendering method. Our hybrid approach combines the efficiency of approximate methods with the high-quality results of physically-based methods. Extensive experiments on synthetic and real data demonstrate that our method can produce reconstructions with similar or higher quality than state-of-the-art methods while being more efficient.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications

848

Label Enhancement via Joint Implicit Representation Clustering

Yunan Lu, Weiwei Li, Xiuyi Jia

[+] More

[-] Less

Label distribution is an effective label form to portray label polysemy (i.e., the cases that an instance can be described by multiple labels simultaneously). However, the expensive annotating cost of label distributions limits its application to a wider range of practical tasks. Therefore, LE (label enhancement) techniques are extensively studied to solve this problem. Existing LE algorithms mostly estimate label distributions by the instance relation or the label relation. However, they suffer from biased instance relations, limited model capabilities, or suboptimal local label correlations. Therefore, in this paper, we propose a deep generative model called JRC to simultaneously learn and cluster the joint implicit representations of both features and labels, which can be used to improve any existing LE algorithm involving the instance relation or local label correlations. Besides, we develop a novel label distribution recovery module, and then integrate it with JRC model, thus constituting a novel generative label enhancement model that utilizes the learned joint implicit representations and instance clusters in a principled way. Finally, extensive experiments validate our proposal.

List of keywords

Machine Learning -> ML: Multi-label
Machine Learning -> ML: Unsupervised learning
Machine Learning -> ML: Weakly supervised learning

856

Graph Sampling-based Meta-Learning for Molecular Property Prediction

Xiang Zhuang, Qiang Zhang, Bin Wu, Keyan Ding, Yin Fang, Huajun Chen

[+] More

[-] Less

Molecular property is usually observed with a limited number of samples, and researchers have considered property prediction as a few-shot problem. One important fact that has been ignored by prior works is that each molecule can be recorded with several different properties simultaneously. To effectively utilize many-to-many correlations of molecules and properties, we propose a Graph Sampling-based Meta-learning (GS-Meta) framework for few-shot molecular property prediction. First, we construct a Molecule-Property relation Graph (MPG): molecule and properties are nodes, while property labels decide edges. Then, to utilize the topological information of MPG, we reformulate an episode in meta-learning as a subgraph of the MPG, containing a target property node, molecule nodes, and auxiliary property nodes. Third, as episodes in the form of subgraphs are no longer independent of each other, we propose to schedule the subgraph sampling process with a contrastive loss function, which considers the consistency and discrimination of subgraphs. Extensive experiments on 5 commonly-used benchmarks show GS-Meta consistently outperforms state-of-the-art methods by 5.71%-6.93% in ROC-AUC and verify the effectiveness of each proposed module.

List of keywords

Machine Learning -> ML: Meta-learning
Machine Learning -> ML: Few-shot learning
Multidisciplinary Topics and Applications -> MDA: Bioinformatics

863

TG-VQA: Ternary Game of Video Question Answering

Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen

[+] More

[-] Less

Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them. However, since relying heavily on human instructions, i.e., annotations or priors, current contrastive learning-based VideoQA methods remains challenging to perform fine-grained visual-linguistic alignments. In this work, we innovatively resort to game theory, which can simulate complicated relationships among multiple players with specific interaction strategies, e.g., video, question, and answer as ternary players, to achieve fine-grained alignment for VideoQA task. Specifically, we carefully design a VideoQA-specific interaction strategy to tailor the characteristics of VideoQA, which can mathematically generate the fine-grained visual-linguistic alignment label without label-intensive efforts. Our TG-VQA outperforms existing state-of-the-art by a large margin (more than 5%) on long-term and short-term VideoQA datasets, verifying its effectiveness and generalization ability. Thanks to the guidance of game-theoretic interaction, our model impressively convergences well on limited data (10^4 videos), surpassing most of those pre-trained on large-scale data (10^7 videos).

List of keywords

Computer Vision -> CV: Visual reasoning and symbolic representation
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Video analysis and understanding

879

Data Level Lottery Ticket Hypothesis for Vision Transformers

Xuan Shen, Zhenglun Kong, Minghai Qin, Peiyan Dong, Geng Yuan, Xin Meng, Hao Tang, Xiaolong Ma, Yanzhi Wang

[+] More

[-] Less

The conventional lottery ticket hypothesis (LTH) claims that there exists a sparse subnetwork within a dense neural network and a proper random initialization method, called the winning ticket, such that it can be trained from scratch to almost as good as the dense counterpart. Meanwhile, the research of LTH in vision transformers (ViTs) is scarcely evaluated. In this paper, we first show that the conventional winning ticket is hard to find at weight level of ViTs by existing methods. Then, we generalize the LTH for ViTs to input data consisting of image patches inspired by the input dependence of ViTs. That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches. We call this subset of input patches the winning tickets, which represent a significant amount of information in the input data. We use a ticket selector to generate the winning tickets based on the informativeness of patches for various types of ViT, including DeiT, LV-ViT, and Swin Transformers. The experiments show that there is a clear difference between the performance of models trained with winning tickets and randomly selected subsets, which verifies our proposed theory. We elaborate the analogical similarity between our proposed Data-LTH-ViTs and the conventional LTH for further verifying the integrity of our theory. The Source codes are available at https://github.com/shen494157765/vit-lottery-ticket-input.

List of keywords

Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Theory of deep learning

880

A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram

Ming-Liang Zhang, Fei yin, Cheng-Lin Liu

[+] More

[-] Less

Geometry problem solving (GPS) is a high-level mathematical reasoning requiring the capacities of multi-modal fusion and geometric knowledge application. Recently, neural solvers have shown great potential in GPS but still be short in diagram presentation and modal fusion. In this work, we convert diagrams into basic textual clauses to describe diagram features effectively, and propose a new neural solver called PGPSNet to fuse multi-modal information efficiently. Combining structural and semantic pre-training, data augmentation and self-limited decoding, PGPSNet is endowed with rich knowledge of geometry theorems and geometric representation, and therefore promotes geometric understanding and reasoning. In addition, to facilitate the research of GPS, we build a new large-scale and fine-annotated GPS dataset named PGPS9K, labeled with both fine-grained diagram annotation and interpretable solution program. Experiments on PGPS9K and an existing dataset Geometry3K validate the superiority of our method over the state-of-the-art neural solvers. Our code, dataset and appendix material are available at \url{https://github.com/mingliangzhang2018/PGPS}.

List of keywords

Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
Machine Learning -> ML: Multi-modal learning
Multidisciplinary Topics and Applications -> MDA: Education

882

Incomplete Multi-view Clustering via Prototype-based Imputation

Haobin Li, Yunfan Li, Mouxing Yang, Peng Hu, Dezhong Peng, Xi Peng

[+] More

[-] Less

In this paper, we study how to achieve two characteristics highly-expected by incomplete multi-view clustering (IMvC). Namely, i) instance commonality refers to that within-cluster instances should share a common pattern, and ii) view versatility refers to that cross-view samples should own view-specific patterns. To this end, we design a novel dual-stream model which employs a dual attention layer and a dual contrastive learning loss to learn view-specific prototypes and model the sample-prototype relationship. When the view is missed, our model performs data recovery using the prototypes in the missing view and the sample-prototype relationship inherited from the observed view. Thanks to our dual-stream model, both cluster- and view-specific information could be captured, and thus the instance commonality and view versatility could be preserved to facilitate IMvC. Extensive experiments demonstrate the superiority of our method on five challenging benchmarks compared with 11 approaches. The code could be accessed from https://pengxi.me.

List of keywords

Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Clustering

892

WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation

Zesen Cheng, Peng Jin, Hao Li, Kehan Li, Siheng Li, Xiangyang Ji, Chang Liu, Jie Chen

[+] More

[-] Less

The top-down and bottom-up methods are two mainstreams of referring segmentation, while both methods have their own intrinsic weaknesses. Top-down methods are chiefly disturbed by Polar Negative (PN) errors owing to the lack of fine-grained cross-modal alignment. Bottom-up methods are mainly perturbed by Inferior Positive (IP) errors due to the lack of prior object information. Nevertheless, we discover that two types of methods are highly complementary for restraining respective weaknesses but the direct average combination leads to harmful interference. In this context, we build Win-win Cooperation (WiCo) to exploit complementary nature of two types of methods on both interaction and integration aspects for achieving a win-win improvement. For the interaction aspect, Complementary Feature Interaction (CFI) introduces prior object information to bottom-up branch and provides fine-grained information to top-down branch for complementary feature enhancement. For the integration aspect, Gaussian Scoring Integration (GSI) models the gaussian performance distributions of two branches and weighted integrates results by sampling confident scores from the distributions. With our WiCo, several prominent bottom-up and top-down combinations achieve remarkable improvements on three common datasets with reasonable extra costs, which justifies effectiveness and generality of our method.

List of keywords

Computer Vision -> CV: Vision and language
Computer Vision -> CV: Segmentation

902

Cross-Domain Facial Expression Recognition via Disentangling Identity Representation

Tong Liu, Jing Li, Jia Wu, Lefei Zhang, Shanshan Zhao, Jun Chang, Jun Wan

[+] More

[-] Less

Most existing cross-domain facial expression recognition (FER) works require target domain data to assist the model in analyzing distribution shifts to overcome negative effects. However, it is often hard to obtain expression images of the target domain in practical applications. Moreover, existing methods suffer from the interference of identity information, thus limiting the discriminative ability of the expression features. We exploit the idea of domain generalization (DG) and propose a representation disentanglement model to address the above problems. Specifically, we learn three independent potential subspaces corresponding to the domain, expression, and identity information from facial images. Meanwhile, the extracted expression and identity features are recovered as Fourier phase information reconstructed images, thereby ensuring that the high-level semantics of images remain unchanged after disentangling the domain information. Our proposed method can disentangle expression features from expression-irrelevant ones (i.e., identity and domain features). Therefore, the learned expression features exhibit sufficient domain invariance and discriminative ability. We conduct experiments with different settings on multiple benchmark datasets, and the results show that our method achieves superior performance compared with state-of-the-art methods.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Representation learning

906

A Unification Framework for Euclidean and Hyperbolic Graph Neural Networks

Mehrdad khatir, Nurendra Choudhary, Sutanay Choudhury, Khushbu Agarwal, Chandan K Reddy

[+] More

[-] Less

Hyperbolic neural networks are able to capture the inherent hierarchy of graph datasets, and consequently, a powerful choice of GNNs. However, they entangle multiple incongruent (gyro-)vector spaces within a layer, which makes them limited in terms of generalization and scalability. In this work, we propose to use Poincare disk model as our search space, and apply all approximations on the disk (as if the disk is a tangent space derived from the origin), and thus getting rid of all inter-space transformations. Such an approach enables us to propose a hyperbolic normalization layer, and to further simplify the entire hyperbolic model to a Euclidean model cascaded with our hyperbolic normalization layer. We applied our proposed nonlinear hyperbolic normalization to the current state-of-the-art homogeneous and multi-relational graph networks. We demonstrate that not only does the model leverage the power of Euclidean networks such as interpretability and efficient execution of various model components, but also it outperforms both Euclidean and hyperbolic counterparts in our benchmarks.

List of keywords

Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Representation learning

920

On Efficient Transformer-Based Image Pre-training for Low-Level Vision

Wenbo Li, Xin Lu, Shengju Qian, Jiangbo Lu

[+] More

[-] Less

Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems. In this paper, we tailor transformer-based pre-training regimes that boost various low-level tasks. To comprehensively diagnose the influence of pre-training, we design a whole set of principled evaluation tools that uncover its effects on internal representations. The observations demonstrate that pre-training plays strikingly different roles in low-level tasks. For example, pre-training introduces more local information to intermediate layers in super-resolution (SR), yielding significant performance gains, while pre-training hardly affects internal feature representations in denoising, resulting in limited gains. Further, we explore different methods of pre-training, revealing that multi-related-task pre-training is more effective and data-efficient than other alternatives. Finally, we extend our study to varying data scales and model sizes, as well as comparisons between transformers and CNNs. Based on the study, we successfully develop state-of-the-art models for multiple low-level tasks.

List of keywords

Computer Vision -> CV: Computational photography
Computer Vision -> CV: Applications
Computer Vision -> CV: Representation learning

924

STS-GAN: Can We Synthesize Solid Texture with High Fidelity from Arbitrary 2D Exemplar?

Xin Zhao, Jifeng Guo, Lin Wang, Fanqi Li, Jiahao Li, Junteng Zheng, Bo Yang

[+] More

[-] Less

Solid texture synthesis (STS), an effective way to extend a 2D exemplar to a 3D solid volume, exhibits advantages in computational photography. However, existing methods generally fail to accurately learn arbitrary textures, which may result in the failure to synthesize solid textures with high fidelity. In this paper, we propose a novel generative adversarial nets-based framework (STS-GAN) to extend the given 2D exemplar to arbitrary 3D solid textures. In STS-GAN, multi-scale 2D texture discriminators evaluate the similarity between the given 2D exemplar and slices from the generated 3D texture, promoting the 3D texture generator synthesizing realistic solid textures. Finally, experiments demonstrate that the proposed method can generate high-fidelity solid textures with similar visual characteristics to the 2D exemplar.

List of keywords

Computer Vision -> CV: Computational photography
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Neural generative models, auto encoders, GANs

928

PPAT: Progressive Graph Pairwise Attention Network for Event Causality Identification

Zhenyu Liu, Zhenran Xu, Baotian Hu, Min Zhang

[+] More

[-] Less

Event Causality Identification (ECI) aims to identify the causality between a pair of event mentions in a document, which is composed of sentence-level ECI (SECI) and document-level ECI (DECI). Previous work applies various reasoning models to identify the implicit event causality. However, they indiscriminately reason all event causality in the same way, ignoring that most inter-sentence event causality depends on intra-sentence event causality to infer. In this paper, we propose a progressive graph pairwise attention network (PPAT) to consider the above dependence. PPAT applies a progressive reasoning strategy, as it first predicts the intra-sentence event causality, and then infers the more implicit inter-sentence event causality based on the SECI result. We construct a sentence boundary event relational graph, and PPAT leverages a simple pairwise attention mechanism, which attends to different reasoning chains on the graph. In addition, we propose a causality-guided training strategy for assisting PPAT in learning causality-related representations on every layer. Extensive experiments show that our model achieves state-of-the-art performance on three benchmark datasets (5.5%, 2.2% and 4.5% F1 gains on EventStoryLine, MAVEN-ERE and Causal-TimeBank).

List of keywords

Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Information extraction

930

A Low Latency Adaptive Coding Spike Framework for Deep Reinforcement Learning

Lang Qin, Rui Yan, Huajin Tang

[+] More

[-] Less

In recent years, spiking neural networks (SNNs) have been used in reinforcement learning (RL) due to their low power consumption and event-driven features. However, spiking reinforcement learning (SRL), which suffers from fixed coding methods, still faces the problems of high latency and poor versatility. In this paper, we use learnable matrix multiplication to encode and decode spikes, improving the flexibility of the coders and thus reducing latency. Meanwhile, we train the SNNs using the direct training method and use two different structures for online and offline RL algorithms, which gives our model a wider range of applications. Extensive experiments have revealed that our method achieves optimal performance with ultra-low latency (as low as 0.8% of other SRL methods) and excellent energy efficiency (up to 5X the DNNs) in different algorithms and different environments.

List of keywords

Humans and AI -> HAI: Cognitive modeling
Machine Learning -> ML: Deep reinforcement learning
Robotics -> ROB: Cognitive robotics

952

Sph2Pob: Boosting Object Detection on Spherical Images with Planar Oriented Boxes Methods

Xinyuan Liu, Hang Xu, Bin Chen, Qiang Zhao, Yike Ma, Chenggang Yan, Feng Dai

[+] More

[-] Less

Object detection on panoramic/spherical images has been developed rapidly in the past few years, where IoU-calculator is a fundamental part of various detector components, i.e. Label Assignment, Loss and NMS. Due to the low efficiency and non-differentiability of spherical Unbiased IoU, spherical approximate IoU methods have been proposed recently. We find that the key of these approximate methods is to map spherical boxes to planar boxes. However, there exists two problems in these methods: (1) they do not eliminate the influence of panoramic image distortion; (2) they break the original pose between bounding boxes. They lead to the low accuracy of these methods. Taking the two problems into account, we propose a new sphere-plane boxes transform, called Sph2Pob. Based on the Sph2Pob, we propose (1) an differentiable IoU, Sph2Pob-IoU, for spherical boxes with low time-cost and high accuracy and (2) an agent Loss, Sph2Pob-Loss, for spherical detection with high flexibility and expansibility. Extensive experiments verify the effectiveness and generality of our approaches, and Sph2Pob-IoU and Sph2Pob-Loss together boost the performance of spherical detectors. The source code is available at https://github.com/AntXinyuan/sph2pob.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Scene analysis and understanding

953

APR: Online Distant Point Cloud Registration through Aggregated Point Cloud Reconstruction

Quan Liu, Yunsong Zhou, Hongzi Zhu, Shan Chang, Minyi Guo

[+] More

[-] Less

For many driving safety applications, it is of great importance to accurately register LiDAR point clouds generated on distant moving vehicles. However, such point clouds have extremely different point density and sensor perspective on the same object, making registration on such point clouds very hard. In this paper, we propose a novel feature extraction framework, called APR, for online distant point cloud registration. Specifically, APR leverages an autoencoder design, where the autoencoder reconstructs a denser aggregated point cloud with several frames instead of the original single input point cloud. Our design forces the encoder to extract features with rich local geometry information based on one single input point cloud. Such features are then used for online distant point cloud registration. We conduct extensive experiments against state-of-the-art (SOTA) feature extractors on KITTI and nuScenes datasets. Results show that APR outperforms all other extractors by a large margin, increasing average registration recall of SOTA extractors by 7.1% on LoKITTI and 4.6% on LoNuScenes. Code is available at https://github.com/liuQuan98/APR.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Machine learning for vision

962

Quick Multi-Robot Motion Planning by Combining Sampling and Search

Keisuke Okumura, Xavier Défago

[+] More

[-] Less

We propose a novel algorithm to solve multi-robot motion planning (MRMP) rapidly, called Simultaneous Sampling-and-Search Planning (SSSP). Conventional MRMP studies mostly take the form of two-phase planning that constructs roadmaps and then finds inter-robot collision-free paths on those roadmaps. In contrast, SSSP simultaneously performs roadmap construction and collision-free pathfinding. This is realized by uniting techniques of single-robot sampling-based motion planning and search techniques of multi-agent pathfinding on discretized spaces. Doing so builds the small search space, leading to quick MRMP. SSSP ensures finding a solution eventually if exists. Our empirical evaluations in various scenarios demonstrate that SSSP significantly outperforms standard approaches to MRMP, i.e., solving more problem instances much faster. We also applied SSSP to planning for 32 ground robots in a dense situation.

List of keywords

974

DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards

Shanchuan Wan, Yujin Tang, Yingtao Tian, Tomoyuki Kaneko

[+] More

[-] Less

Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness crucially decides the performance of RL algorithms, especially when facing sparse extrinsic rewards. Recent studies showed the effectiveness of encouraging exploration with intrinsic rewards estimated from novelty in observations. However, there is a gap between the novelty of an observation and an exploration in general, because the stochasticity in the environment as well as the behavior of an agent may affect the observation. To estimate exploratory behaviors accurately, we propose DEIR, a novel method where we theoretically derive an intrinsic reward from a conditional mutual information term that principally scales with the novelty contributed by agent explorations, and materialize the reward with a discriminative forward model. We conduct extensive experiments in both standard and hardened exploration games in MiniGrid to show that DEIR quickly learns a better policy than baselines. Our evaluations in ProcGen demonstrate both generalization capabilities and the general applicability of our intrinsic reward.

List of keywords

Machine Learning -> ML: Reinforcement learning

978

Helpful Information Sharing for Partially Informed Planning Agents

Sarah Keren, David Wies, Sara Bernardini

[+] More

[-] Less

In many real world settings, an autonomous agent may not have sufficient information and sensory capabilities to accomplish its goals, even when they are achievable. In some cases, the needed information can be provided by another agent, but information sharing might be costly due to limited communication bandwidth and other constraints. We address the problem of Helpful Information Sharing (HIS), which focuses on selecting minimal information to reveal to the partially informed agent (or actor) in order to guarantee it can achieve its goal. As the space of possible information items to share may be large, it is crucial to devise efficient methods to identify optimal interventions that represent the sharing of only information that is critical for task completion and that cannot be acquired by the agent on its own. For this purpose, we offer a novel compilation of HIS to a classical planning problem that can be solved efficiently by any off-the-shelf solver. We provide guarantees of optimality for our approach and describe its extensions to support maximizing robustness and to settings in which the agent needs to decide which sensors to deploy in the environment. We demonstrate the efficiency of our approaches on a set of standard benchmarks as well as on a novel benchmark of an Escape Room.

List of keywords

Planning and Scheduling -> PS: Planning with Incomplete Information
Planning and Scheduling -> PS: Model-based reasoning

981

Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation

Shaofei Huang, Han Li, Yuqing Wang, Hongji Zhu, Jiao Dai, Jizhong Han, Wenge Rong, Si Liu

[+] More

[-] Less

Audio visual segmentation (AVS) aims to segment the sounding objects for each frame of a given video. To distinguish the sounding objects from silent ones, both audio-visual semantic correspondence and temporal interaction are required. The previous method applies multi-frame cross-modal attention to conduct pixel-level interactions between audio features and visual features of multiple frames simultaneously, which is both redundant and implicit. In this paper, we propose an Audio-Queried Transformer architecture, AQFormer, where we define a set of object queries conditioned on audio information and associate each of them to particular sounding objects. Explicit object-level semantic correspondence between audio and visual modalities is established by gathering object information from visual features with predefined audio queries. Besides, an Audio-Bridged Temporal Interaction module is proposed to exchange sounding object-relevant information among multiple frames with the bridge of audio features. Extensive experiments are conducted on two AVS benchmarks to show that our method achieves state-of-the-art performances, especially 7.1% M_J and 7.6% M_F gains on the MS3 setting.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Video analysis and understanding

990

ProMix: Combating Label Noise via Maximizing Clean Sample Utility

Ruixuan Xiao, Yiwen Dong, Haobo Wang, Lei Feng, Runze Wu, Gang Chen, Junbo Zhao

[+] More

[-] Less

Learning with Noisy Labels (LNL) has become an appealing topic, as imperfectly annotated data are relatively cheaper to obtain. Recent state-of-the-art approaches employ specific selection mechanisms to separate clean and noisy samples and then apply Semi-Supervised Learning (SSL) techniques for improved performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. To fulfill this, we propose a novel LNL framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high confidence selection technique that selects those examples with high confidence scores and matched predictions with given labels to dynamically expand a base clean sample set. To overcome the potential side effect of excessive clean set selection procedure, we further devise a novel SSL framework that is able to train balanced and unbiased classifiers on the separated clean and noisy samples. Extensive experiments demonstrate that ProMix significantly advances the current state-of-the-art results on multiple benchmarks with different types and levels of noise. It achieves an average improvement of 2.48% on the CIFAR-N dataset.

List of keywords

Machine Learning -> ML: Weakly supervised learning

997

Outsourcing Adjudication to Strategic Jurors

Ioannis Caragiannis, Nikolaj Schwartzbach

[+] More

[-] Less

We study a scenario where an adjudication task (e.g., the resolution of a binary dispute) is outsourced to a set of agents who are appointed as jurors. This scenario is particularly relevant in a Web3 environment, where no verification of the adjudication outcome is possible, and the appointed agents are, in principle, indifferent to the final verdict. We consider simple adjudication mechanisms that use (1) majority voting to decide the final verdict and (2) a payment function to reward the agents with the majority vote and possibly punish the ones in the minority. Agents interact with such a mechanism strategically: they exert some effort to understand how to properly judge the dispute and cast a yes/no vote that depends on this understanding and on information they have about the rest of the votes. Eventually, they vote so that their utility (i.e., their payment from the mechanism minus the cost due to their effort) is maximized. Under reasonable assumptions about how an agent’s effort is related to her understanding of the dispute, we show that appropriate payment functions can be used to recover the correct adjudication outcome with high probability. Our findings follow from a detailed analysis of the induced strategic game and make use of both theoretical arguments and simulation experiments.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Noncooperative games

1003

New Fairness Concepts for Allocating Indivisible Items

Ioannis Caragiannis, Jugal Garg, Nidhi Rathi, Eklavya Sharma, Giovanna Varricchio

[+] More

[-] Less

For the fundamental problem of \emph{fairly} dividing a set of indivisible items among agents, \emph{envy-freeness up to any item} (EFX) and \emph{maximin fairness} (MMS) are arguably the most compelling fairness concepts proposed till now. Unfortunately, despite significant efforts over the past few years, whether EFX allocations always exist is still an enigmatic open problem, let alone their efficient computation. Furthermore, today we know that MMS allocations are not always guaranteed to exist. These facts weaken the usefulness of both EFX and MMS, albeit their appealing conceptual characteristics. We propose two alternative fairness concepts—called \emph{epistemic EFX} (EEFX) and \emph{minimum EFX value fairness} (MXS)—inspired by EFX and MMS. For both, we explore their relationships to well-studied fairness notions and, more importantly, prove that EEFX and MXS allocations always exist and can be computed efficiently for additive valuations. Our results justify that the new fairness concepts are excellent alternatives to EFX and MMS.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Computational social choice

1009

Dynamic flows on curved space generated by labeled data

Xinru Hua, Truyen Nguyen, Tam Le, Jose Blanchet, Viet Anh Nguyen

[+] More

[-] Less

The scarcity of labeled data is a long-standing challenge for many machine learning tasks. We propose our gradient flow method to leverage the existing dataset (i.e., source) to generate new samples that are close to the dataset of interest (i.e., target). We lift both datasets to the space of probability distributions on the feature-Gaussian manifold, and then develop a gradient flow method that minimizes the maximum mean discrepancy loss. To perform the gradient flow of distributions on the curved feature-Gaussian space, we unravel the Riemannian structure of the space and compute explicitly the Riemannian gradient of the loss function induced by the optimal transport metric. For practical applications, we also propose a discretized flow, and provide conditional results guaranteeing the global convergence of the flow to the optimum. We illustrate the results of our proposed gradient flow method on several real-world datasets and show our method can improve the accuracy of classification models in transfer learning settings.

List of keywords

Machine Learning -> ML: Optimization
Machine Learning -> ML: Multi-task and transfer learning
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Few-shot learning

1021

Fair Division with Two-Sided Preferences

Ayumi Igarashi, Yasushi Kawase, Warut Suksompong, Hanna Sumita

[+] More

[-] Less

We study a fair division setting in which a number of players are to be fairly distributed among a set of teams. In our model, not only do the teams have preferences over the players as in the canonical fair division setting, but the players also have preferences over the teams. We focus on guaranteeing envy-freeness up to one player (EF1) for the teams together with a stability condition for both sides. We show that an allocation satisfying EF1, swap stability, and individual stability always exists and can be computed in polynomial time, even when teams may have positive or negative values for players. Similarly, a balanced and swap stable allocation that satisfies a relaxation of EF1 can be computed efficiently. When teams have nonnegative values for players, we prove that an EF1 and Pareto optimal allocation exists and, if the valuations are binary, can be found in polynomial time. We also examine the compatibility between EF1 and justified envy-freeness.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Game Theory and Economic Paradigms -> GTEP: Computational social choice

1022

Towards Long-delayed Sparsity: Learning a Better Transformer through Reward Redistribution

Tianchen Zhu, Yue Qiu, Haoyi Zhou, Jianxin Li

[+] More

[-] Less

Recently, Decision Transformer (DT) pioneered the offline RL into a contextual conditional sequence modeling paradigm, which leverages self-attended autoregression to learn from global target rewards, states, and actions. However, many applications have a severe delay of the above signals, such as the agent can only obtain a reward signal at the end of each trajectory. This delay causes an unwanted bias cumulating in autoregressive learning global signals. In this paper, we focused its virtual example on episodic reinforcement learning with trajectory feedback. We propose a new reward redistribution algorithm for learning parameterized reward functions, and it decomposes the long-delayed reward onto each timestep. To improve the redistributing’s adaptation ability, we formulate the previous decomposition as a bi-level optimization problem for global optimal. We extensively evaluate the proposed method on various benchmarks and demonstrate an overwhelming performance improvement under long-delayed settings.

List of keywords

Machine Learning -> ML: Deep reinforcement learning
Planning and Scheduling -> PS: POMDPs
Uncertainty in AI -> UAI: Sequential decision making

1028

Communication-Efficient Stochastic Gradient Descent Ascent with Momentum Algorithms

Yihan Zhang, Meikang Qiu, Hongchang Gao

[+] More

[-] Less

Numerous machine learning models can be formulated as a stochastic minimax optimization problem, such as imbalanced data classification with AUC maximization. Developing efficient algorithms to optimize such kinds of problems is of importance and necessity. However, most existing algorithms restrict their focus on the single-machine setting so that they are incapable of dealing with the large communication overhead in a distributed training system. Moreover, most existing communication-efficient optimization algorithms only focus on the traditional \textit{minimization} problem, failing to handle the \textit{minimax} optimization problem. To address these challenging issues, in this paper, we develop two novel communication-efficient stochastic gradient descent ascent with momentum algorithms for the distributed minimax optimization problem, which can significantly reduce the communication cost via the two-way compression scheme. However, the compressed \textit{momentum} makes it considerably challenging to investigate the convergence rate of our algorithms, especially in the presence of the interaction between the minimization and maximization subproblems. In this paper, we successfully addressed these challenges and established the convergence rate of our algorithms for nonconvex-strongly-concave problems. To the best of our knowledge, our algorithms are the first communication-efficient algorithm with theoretical guarantees for the \textit{minimax} optimization problem. Finally, we apply our algorithm to the distributed AUC maximization problem for the imbalanced data classification task. Extensive experimental results confirm the efficacy of our algorithm in saving communication cost.

List of keywords

Data Mining -> DM: Parallel, distributed and cloud-based high performance mining

1032

Fine-tuned vs. Prompt-tuned Supervised Representations: Which Better Account for Brain Language Representations?

Jingyuan Sun, Sien Moens

[+] More

[-] Less

To decipher the algorithm underlying the human brain’s language representation, previous work probed brain responses to language input with pre-trained artificial neural network (ANN) models fine-tuned on NLU tasks. However, fine-tuning generally updates the full parametric space and distorts pre-trained features, cognitively inconsistent with the brain’s robust multi-task learning ability. Prompt-tuning, in contrast with fine-tuning, protects pre-trained weights and learns task-specific embeddings to fit a task. Could prompt-tuning generate representations that better account for the brain’s language representations than fine-tuning? If so, what kind of NLU task leads a pre-trained model to better decode the information represented in the human brain? We investigate these questions by comparing prompt-tuned and fine-tuned representations in neural decoding, that is predicting the linguistic stimulus from the brain activities evoked by the stimulus. We find that on none of the 10 NLU tasks, fine-tuning significantly outperforms prompt-tuning in neural decoding, implicating that a more brain-consistent tuning method yields representations better correlating with the brain data. Moreover, we identify that tasks dealing with fine-grained concept meaning yield representations that better decode brain activation patterns than other tasks, especially the syntactic chunking task. This indicates that our brain encodes more fine-grained concept information than shallow syntactic information when representing languages.

List of keywords

Natural Language Processing -> NLP: Embeddings
Humans and AI -> HAI: Brain sciences
Humans and AI -> HAI: Cognitive modeling

1034

Multi-Agent Systems with Quantitative Satisficing Goals

Senthil Rajasekaran, Suguman Bansal, Moshe Vardi

[+] More

[-] Less

In the study of reactive systems, qualitative properties are usually easier to model and analyze than quantitative properties. This is especially true in systems where mutually beneficial cooperation between agents is possible, such as multi-agent systems. The large number of possible payoffs available to agents in reactive systems with quantitative properties means that there are many scenarios in which agents deviate from mutually beneficial outcomes in order to gain negligible payoff improvements. This behavior often leads to less desirable outcomes for all agents involved. For this reason we study satisficing goals, derived from a decision-making approach aimed at meeting a good-enough outcome instead of pure optimization. By considering satisficing goals, we are able to employ efficient automata-based algorithms to find pure-strategy Nash equilibria. We then show that these algorithms extend to scenarios in which agents have multiple thresholds, providing an approximation of optimization while still retaining the possibility of mutually beneficial cooperation and efficient automata-based algorithms. Finally, we demonstrate a one-way correspondence between the existence of epsilon-equilibria and the existence of equilibria in games where agents have multiple thresholds.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning

1045

Learning Efficient Truthful Mechanisms for Trading Networks

Takayuki Osogami, Segev Wasserkrug, Elisheva S. Shamash

[+] More

[-] Less

Trading networks are an indispensable part of today’s economy, but to compete successfully with others, they must be efficient in maximizing the value they provide to the external market. While the prior work relies on truthful disclosure of private information to achieve efficiency, we study the problem of designing mechanisms that result in efficient trading networks by incentivizing firms to truthfully reveal their private information to a third party. Additional desirable properties of such mechanisms are weak budget balance (WBB; the third party needs not invest) and individual rationality (IR; firms get non-negative utility). Unlike combinatorial auctions, there may not exist mechanisms that simultaneously satisfy these properties ex post for trading networks. We propose an approach for computing or learning truthful and efficient mechanisms for given networks in a Bayesian setting, where WBB and IR, respectively, are relaxed to ex ante and interim for a given distribution over the private information. We incorporate techniques to reduce computational and sample complexity. We empirically demonstrate that the proposed approach successfully finds the mechanisms with the relaxed properties for trading networks where achieving ex post properties is impossible.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Mechanism design
Multidisciplinary Topics and Applications -> MDA: Economics

1063

Black-box Prompt Tuning for Vision-Language Model as a Service

Lang Yu, Qin Chen, Jiaju Lin, Liang He

[+] More

[-] Less

In the scenario of Model-as-a-Service (MaaS), pre-trained models are usually released as inference APIs. Users are allowed to query those models with manually crafted prompts. Without accessing the network structure and gradient information, it’s tricky to perform continuous prompt tuning on MaaS, especially for vision-language models (VLMs) considering cross-modal interaction. In this paper, we propose a black-box prompt tuning framework for VLMs to learn task-relevant prompts without back-propagation. In particular, the vision and language prompts are jointly optimized in the intrinsic parameter subspace with various evolution strategies. Different prompt variants are also explored to enhance the cross-model interaction. Experimental results show that our proposed black-box prompt tuning framework outperforms both hand-crafted prompt engineering and gradient-based prompt learning methods, which serves as evidence of its capability to train task-relevant prompts in a derivative-free manner.

List of keywords

Computer Vision -> CV: Vision and language
Machine Learning -> ML: Evolutionary learning
Machine Learning -> ML: Multi-modal learning

1065

An Exact Algorithm for the Minimum Dominating Set Problem

Hua Jiang, Zhifei Zheng

[+] More

[-] Less

The Minimum Dominating Set (MDS) problem is a classic NP-hard combinatorial optimization problem with many practical applications. Solving MDS is extremely challenging in computation. Previous work on exact algorithms mainly focuses on improving the theoretical time complexity and existing practical algorithms for MDS are almost based on heuristic search. In this paper, we propose a novel lower bound and an exact algorithm for MDS. The algorithm implements a branch-and-bound (BnB) approach and employs the new lower bound to reduce search space. Extensive empirical results show that the new lower bound is efficient in reduction of the search space and the new algorithm is effective for the standard instances and real-world instances. To the best of our knowledge, this is the first effective BnB algorithm for MDS.

List of keywords

Search -> S: Combinatorial search and optimisation
Search -> S: Heuristic search

1067

Few-shot Classification via Ensemble Learning with Multi-Order Statistics

Sai Yang, Fan Liu, Delong Chen, Jun Zhou

[+] More

[-] Less

Transfer learning has been widely adopted for few-shot classification. Recent studies reveal that obtaining good generalization representation of images on novel classes is the key to improving the few-shot classification accuracy. To address this need, we prove theoretically that leveraging ensemble learning on the base classes can correspondingly reduce the true error in the novel classes. Following this principle, a novel method named Ensemble Learning with Multi-Order Statistics (ELMOS) is proposed in this paper. In this method, after the backbone network, we use multiple branches to create the individual learners in the ensemble learning, with the goal to reduce the storage cost. We then introduce different order statistics pooling in each branch to increase the diversity of the individual learners. The learners are optimized with supervised losses during the pre-training phase. After pre-training, features from different branches are concatenated for classifier evaluation. Extensive experiments demonstrate that each branch can complement the others and our method can produce a state-of-the-art performance on multiple few-shot classification benchmark datasets.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

1068

A Refined Upper Bound and Inprocessing for the Maximum k-plex Problem

Hua Jiang, Fusheng Xu, Zhifei Zheng, Bowen Wang, Wei Zhou

[+] More

[-] Less

A k-plex of a graph G is an induced subgraph in which every vertex has at most k-1 nonadjacent vertices. The Maximum k-plex Problem (MKP) consists in finding a k-plex of the largest size, which is NP-hard and finds many applications. Existing exact algorithms mainly implement a branch-and-bound approach and improve performance by integrating effective upper bounds and graph reduction rules. In this paper, we propose a refined upper bound, which can derive a tighter upper bound than existing methods, and an inprocessing strategy, which performs graph reduction incrementally. We implement a new BnB algorithm for MKP that employs the two components to reduce the search space. Extensive experiments show that both the refined upper bound and the inprocessing strategy are very efficient in the reduction of search space. The new algorithm outperforms the state-of-the-art algorithms on the tested benchmarks significantly.

List of keywords

Search -> S: Combinatorial search and optimisation
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Search -> S: Heuristic search

1072

Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention

Xiangcheng Liu, Tianyi Wu, Guodong Guo

[+] More

[-] Less

Vision transformer has emerged as a new paradigm in computer vision, showing excellent performance while accompanied by expensive computational cost. Image token pruning is one of the main approaches for ViT compression, due to the facts that the complexity is quadratic with respect to the token number, and many tokens containing only background regions do not truly contribute to the final prediction. Existing works either rely on additional modules to score the importance of individual tokens, or implement a fixed ratio pruning strategy for different input instances. In this work, we propose an adaptive sparse token pruning framework with a minimal cost. Specifically, we firstly propose an inexpensive attention head importance weighted class attention scoring mechanism. Then, learnable parameters are inserted as thresholds to distinguish informative tokens from unimportant ones. By comparing token attention scores and thresholds, we can discard useless tokens hierarchically and thus accelerate inference. The learnable thresholds are optimized in budget-aware training to balance accuracy and complexity, performing the corresponding pruning configurations for different input instances. Extensive experiments demonstrate the effectiveness of our approach. Our method improves the throughput of DeiT-S by 50% and brings only 0.2% drop in top-1 accuracy, which achieves a better trade-off between accuracy and latency than the previous methods.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Machine Learning -> ML: Attention models

1078

Hierarchical State Abstraction based on Structural Information Principles

Xianghua Zeng, Hao Peng, Angsheng Li, Chunyang Liu, Lifang He, Philip S. Yu

[+] More

[-] Less

State abstraction optimizes decision-making by ignoring irrelevant environmental information in reinforcement learning with rich observations. Nevertheless, recent approaches focus on adequate representational capacities resulting in essential information loss, affecting their performances on challenging tasks. In this article, we propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective. Specifically, an unsupervised, adaptive hierarchical state clustering method without requiring manual assistance is presented, and meanwhile, an optimal encoding tree is generated. On each non-root tree node, a new aggregation function and condition structural entropy are designed to achieve hierarchical state abstraction and compensate for sampling-induced essential information loss in state abstraction. Empirical evaluations on a visual gridworld domain and six continuous control benchmarks demonstrate that, compared with five SOTA state abstraction approaches, SISA significantly improves mean episode reward and sample efficiency up to 18.98 and 44.44%, respectively. Besides, we experimentally show that SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.

List of keywords

Machine Learning -> ML: Reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Applications
Machine Learning -> ML: Deep reinforcement learning

1099

LGPConv: Learnable Gaussian Perturbation Convolution for Lightweight Pansharpening

Chen-Yu Zhao, Tian-Jing Zhang, Ran Ran, Zhi-Xuan Chen, Liang-Jian Deng

[+] More

[-] Less

Pansharpening is a critical yet challenging low-level vision task that aims to obtain a high spatial resolution image by fusing a multispectral (MS) image and a panchromatic (PAN) image. While currently used pansharpening methods are based on convolutional neural networks (CNNs) with standard convolution operation, we observe a strong correlation among the channel dimension of the standard convolution kernel, resulting in a significant computational burden and a large amount of redundancy in the pansharpening neural network. In this work, we propose a novel Learnable Gaussian Perturbation Convolution (LGPConv) capable of replacing and surpassing the standard convolution. With theoretical analysis of the given approach, LGPConv simultaneously exploits two specific properties of standard convolution kernels: 1) correlations within channels: we only learn one premier kernel as a base for further expansion, significantly reducing the parameters and avoiding the difficulty of training caused by redundancy; 2) randomness within channels: we simulate randomness and differences among channels by applying perturbations with Gaussian noise, effectively realizing kernel expansion, which enhances the ability of its nonlinear representation. We demonstrate this new technical contribution to a well-designed LGPConv-based pansharpening network. Extensive experiments reveal that our method achieves the state-of-the-art with a minimal number of parameters, to the best of our knowledge.

List of keywords

Machine Learning -> ML: Convolutional networks
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Applications

1100

Improving Heterogeneous Model Reuse by Density Estimation

Anke Tang, Yong Luo, Han Hu, Fengxiang He, kehua Su, Bo Du, Yixin Chen, Dacheng Tao

[+] More

[-] Less

In this paper, we study the problem of multiparty learning, which aims to learn a model using private data from different participants. Model reuse is a promising solution for multiparty learning by assuming that a local model has been trained for each party. Considering the potential sample selection bias among different parties, some heterogeneous model reuse approaches are developed. However, although the pre-trained local models are utilized in these approaches, the characteristics of the local data are not well exploited. This motivates us to estimate the density of local data and design an auxiliary model together with the local classifier for reuse. When some local models are not well pre-trained, we further design a multiparty cross-entropy loss for calibration. Unlike existing approaches, we address the heterogeneous model reuse problem from a decision theory perspective and take advantage of recent advances in density estimation. Experimental results on both synthetic and benchmark data demonstrate the superiority of the proposed method.

List of keywords

Machine Learning -> ML: Multi-task and transfer learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Classification

1111

Violin: Virtual Overbridge Linking for Enhancing Semi-supervised Learning on Graphs with Limited Labels

Siyue Xie, Da Sun Handason Tam, Wing Cheong Lau

[+] More

[-] Less

Graph Neural Networks (GNNs) is a family of promising tools for graph semi-supervised learning. However, in training, most existing GNNs rely heavily on a large amount of labeled data, which is rare in real-world scenarios. Unlabeled data with useful information are usually under-exploited, which limits the representation power of GNNs. To handle these problems, we propose Virtual Overbridge Linking (Violin), a generic framework to enhance the learning capacity of common GNNs. By learning to add virtual overbridges between two nodes that are estimated to be semantic-consistent, labeled and unlabeled data can be correlated. Supervised information can be well utilized in training while simultaneously inducing the model to learn from unlabeled data. Discriminative relation patterns extracted from unlabeled nodes can also be shared with other nodes even if they are remote from each other. Motivated by recent advances in data augmentations, we additionally integrate Violin with the consistency regularized training. Such a scheme yields node representations with better robustness, which significantly enhances a GNN. Violin can be readily extended to a wide range of GNNs without introducing additional learnable parameters. Extensive experiments on six datasets demonstrate that our method is effective and robust under low-label rate scenarios, where Violin can boost some GNNs’ performance by over 10% on node classifications.

List of keywords

Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Representation learning

1117

Co-training with High-Confidence Pseudo Labels for Semi-supervised Medical Image Segmentation

Zhiqiang Shen, Peng Cao, Hua Yang, Xiaoli Liu, Jinzhu Yang, Osmar R. Zaiane

[+] More

[-] Less

Consistency regularization and pseudo labeling-based semi-supervised methods perform co-training using the pseudo labels from multi-view inputs. However, such co-training models tend to converge early to a consensus, degenerating to the self-training ones, and produce low-confidence pseudo labels from the perturbed inputs during training. To address these issues, we propose an Uncertainty-guided Collaborative Mean-Teacher (UCMT) for semi-supervised semantic segmentation with the high-confidence pseudo labels. Concretely, UCMT consists of two main components: 1) collaborative mean-teacher (CMT) for encouraging model disagreement and performing co-training between the sub-networks, and 2) uncertainty-guided region mix (UMIX) for manipulating the input images according to the uncertainty maps of CMT and facilitating CMT to produce high-confidence pseudo labels. Combining the strengths of UMIX with CMT, UCMT can retain model disagreement and enhance the quality of pseudo labels for the co-training segmentation. Extensive experiments on four public medical image datasets including 2D and 3D modalities demonstrate the superiority of UCMT over the state-of-the-art. Code is available at: https://github.com/Senyh/UCMT.

List of keywords

Machine Learning -> ML: Semi-supervised learning
Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Segmentation

1132

Imbalanced Node Classification Beyond Homophilic Assumption

Jie Liu, Mengting He, Guangtao Wang, Quoc Viet Hung Nguyen, Xuequn Shang, Hongzhi Yin

[+] More

[-] Less

Imbalanced node classification widely exists in real-world networks where graph neural networks (GNNs) are usually highly inclined to majority classes and suffer from severe performance degradation on classifying minority class nodes. Various imbalanced node classification methods have been proposed recently which construct synthetic nodes and edges w.r.t. minority classes to balance the label/topology distribution. However, they are all based on homophilic assumption that nodes of the same label tend to connect despite the widely existence of heterophilic edges in real-world graphs. Thus, they uniformly aggregate features from both homophilic and heterophilic neighbors and rely on feature similarity to generate synthetic edges, which cannot be applied to imbalanced graphs in high heterophily. To address this problem, we propose a novel GraphSANN for imbalanced node classification on both homophilic and heterophilic graphs. Firstly, we propose a unified feature mixer to generate synthetic nodes with both homophilic and heterophilic interpolation in a unified way. Next, by randomly sampling edges between synthetic nodes and existing nodes as candidata edges, we design an adaptive subgraph extractor to dynamically extract the contextual subgraphs of candidate edges with flexible ranges. Finally, we develop a multi-filter subgraph encoder which constructs multiple different filter channels to discriminatively aggregate neighbors’ information along the homophilic and heterophilic edges. Extensive experiments on eight benchmark datasets demonstrate the superiority of our model for imbalanced node classificaiton on both homophilic and heterophilic graphs.

List of keywords

Data Mining -> DM: Mining graphs
Data Mining -> DM: Class imbalance and unequal cost
Machine Learning -> ML: Learning graphical models

1134

Minimally Supervised Contextual Inference from Human Mobility: An Iterative Collaborative Distillation Framework

Jiayun Zhang, Xinyang Zhang, Dezhi Hong, Rajesh Gupta, Jingbo Shang

[+] More

[-] Less

Inferring the context about trips and users from mobility data is valuable for mobile service providers to understand their customers and improve their services. Existing methods require a large amount of labels for training, which is hard to meet in practice. In this paper, we study a more practical yet challenging setting—contextual inference using mobility data with minimal supervision (i.e., using a few labels per class and massive unlabeled data). A typical solution is to apply semi-supervised methods that follow a self-training framework to bootstrap a model based on all features. However, the minimal labeled set brings a high risk of overfitting to self-training, leading to unsatisfactory performance. We propose a novel collaborative distillation framework STCOLAB. Specifically, it sequentially trains spatial and temporal modules at each iteration following the supervision of demographic labels. In addition, it distills knowledge to the module being trained using the logits produced by the latest trained module of the other modality, thereby combining the knowledge learned by both modalities to mutually calibrate the modules. Extensive experiments on two real-world datasets show STCOLAB achieves significantly more accurate demographic inference than various baselines.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data

1137

Robust Image Ordinal Regression with Controllable Image Generation

Yi Cheng, Haochao Ying, Renjun Hu, Jinhong Wang, Wenhao Zheng, Xiao Zhang, Danny Chen, Jian Wu

[+] More

[-] Less

Image ordinal regression has been mainly studied along the line of exploiting the order of categories. However, the issues of class imbalance and category overlap that are very common in ordinal regression were largely overlooked. As a result, the performance on minority categories is often unsatisfactory. In this paper, we propose a novel framework called CIG based on controllable image generation to directly tackle these two issues. Our main idea is to generate extra training samples with specific labels near category boundaries, and the sample generation is biased toward the less-represented categories. To achieve controllable image generation, we seek to separate structural and categorical information of images based on structural similarity, categorical similarity, and reconstruction constraints. We evaluate the effectiveness of our new CIG approach in three different image ordinal regression scenarios. The results demonstrate that CIG can be flexibly integrated with off-the-shelf image encoders or ordinal regression models to achieve improvement, and further, the improvement is more significant for minority categories.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)

1145

COOL, a Context Outlooker, and its Application to Question Answering and other Natural Language Processing Tasks

Fangyi Zhu, Stéphane Bressan, See Kiong Ng

[+] More

[-] Less

Vision outlooker improves the performance of vision transformers, which implements a self-attention mechanism by adding outlook attention, a form of local attention. In natural language processing, as has been the case in computer vision and other domains, transformer-based models constitute the state-of-the-art for most processing tasks. In this domain, too, many authors have argued and demonstrated the importance of local context. We present an outlook attention mechanism, COOL, for natural language processing. COOL, added on top of the self-attention layers of a transformer-based model, encodes local syntactic context considering word proximity and more pair-wise constraints than dynamic convolution used by existing approaches. A comparative empirical performance evaluation of an implementation of COOL with different transformer-based models confirms the opportunity for improvement over a baseline using the original models alone for various natural language processing tasks, including question answering. The proposed approach achieves competitive performance with existing state-of-the-art methods on some tasks.

List of keywords

Natural Language Processing -> NLP: Question answering
Natural Language Processing -> NLP: Language models

1152

WBFlow: Few-shot White Balance for sRGB Images via Reversible Neural Flows

chunxiao Li, Xuejing Kang, Anlong Ming

[+] More

[-] Less

The white balance methods for sRGB images (sRGB-WB) aim to remove their non-linear color cast without access to raw values. Although the existing sRGB-WB methods have achieved increasingly better white balance (WB) results, their generalization to the sRGB images from multiple cameras is still under-explored. In this paper, we propose an sRGB-WB network named WBFlow, which not only performs superior white balance for sRGB images but also generalizes to multiple cameras well. In detail, we take advantage of neural flow to ensure the reversibility of WBFlow, which allows it to losslessly render color-cast sRGB images back to pseudo-raw features for linear white balancing, thus achieving superior performance. Furthermore, inspired by the inter-camera approach, we design a camera transformation (CT) in the pseudo-raw feature space for generalizing the WBFlow to different cameras via few-shot learning. Given a few sRGB images from an untrained camera, our WBFlow can perform well on this camera by learning the camera-specific parameters of CT from these images. Extensive experiments show that WBFlow achieves state-of-the-art multi-camera generalization and WB accuracy for sRGB images on three public datasets and our rendered multi-camera sRGB dataset.

List of keywords

Computer Vision -> CV: Computational photography
Computer Vision -> CV: Applications

1155

Deep Multi-view Subspace Clustering with Anchor Graph

Chenhang Cui, Yazhou Ren, Jingyu Pu, Xiaorong Pu, Lifang He

[+] More

[-] Less

Deep multi-view subspace clustering (DMVSC) has recently attracted increasing attention due to its promising performance. However, existing DMVSC methods still have two issues: (1) they mainly focus on using autoencoders to nonlinearly embed the data, while the embedding may be suboptimal for clustering because the clustering objective is rarely considered in autoencoders, and (2) existing methods typically have a quadratic or even cubic complexity, which makes it challenging to deal with large-scale data. To address these issues, in this paper we propose a novel deep multi-view subspace clustering method with anchor graph (DMCAG). To be specific, DMCAG firstly learns the embedded features for each view independently, which are used to obtain the subspace representations. To significantly reduce the complexity, we construct an anchor graph with small size for each view. Then, spectral clustering is performed on an integrated anchor graph to obtain pseudo-labels. To overcome the negative impact caused by suboptimal embedded features, we use pseudo-labels to refine the embedding process to make it more suitable for the clustering task. Pseudo-labels and embedded features are updated alternately. Furthermore, we design a strategy to keep the consistency of the labels based on contrastive learning to enhance the clustering performance. Empirical studies on real-world datasets show that our method achieves superior clustering performance over other state-of-the-art methods.

List of keywords

Machine Learning -> ML: Clustering
Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Self-supervised Learning

1180

Faster Exact MPE and Constrained Optimization with Deterministic Finite State Automata

Filippo Bistaffa

[+] More

[-] Less

We propose a concise function representation based on deterministic finite state automata for exact most probable explanation and constrained optimization tasks in graphical models. We then exploit our concise representation within Bucket Elimination (BE). We denote our version of BE as FABE. FABE significantly improves the performance of BE in terms of runtime and memory requirements by minimizing redundancy. Indeed, results on most probable explanation and weighted constraint satisfaction benchmarks show that FABE often outperforms the state of the art, leading to significant runtime improvements (up to 2 orders of magnitude in our tests).

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Uncertainty in AI -> UAI: Graphical models

1194

Action Recognition with Multi-stream Motion Modeling and Mutual Information Maximization

Yuheng Yang, Haipeng Chen, Zhenguang Liu, Yingda Lyu, Beibei Zhang, Shuang Wu, Zhibo Wang, Kui Ren

[+] More

[-] Less

Action recognition has long been a fundamental and intriguing problem in artificial intelligence. The task is challenging due to the high dimensionality nature of an action, as well as the subtle motion details to be considered. Current state-of-the-art approaches typically learn from articulated motion sequences in the straightforward 3D Euclidean space. However, the vanilla Euclidean space is not efficient for modeling important motion characteristics such as the joint-wise angular acceleration, which reveals the driving force behind the motion. Moreover, current methods typically attend to each channel equally and lack theoretical constrains on extracting task-relevant features from the input. In this paper, we seek to tackle these challenges from three aspects: (1) We propose to incorporate an acceleration representation, explicitly modeling the higher-order variations in motion. (2) We introduce a novel Stream-GCN network equipped with multi-stream components and channel attention, where different representations (i.e., streams) supplement each other towards a more precise action recognition while attention capitalizes on those important channels. (3) We explore feature-level supervision for maximizing the extraction of task-relevant information and formulate this into a mutual information loss. Empirically, our approach sets the new state-of-the-art performance on three benchmark datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Action and behavior recognition

1201

Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential Decision-Making in Multi-Agent Reinforcement Learning

Bin Zhang, Lijuan Li, Zhiwei Xu, Dapeng Li, Guoliang Fan

[+] More

[-] Less

In multi-agent reinforcement learning (MARL), self-interested agents attempt to establish equilibrium and achieve coordination depending on game structure. However, existing MARL approaches are mostly bound by the simultaneous actions of all agents in the Markov game (MG) framework, and few works consider the formation of equilibrium strategies via asynchronous action coordination. In view of the advantages of Stackelberg equilibrium (SE) over Nash equilibrium, we construct a spatio-temporal sequential decision-making structure derived from the MG and propose an N-level policy model based on a conditional hypernetwork shared by all agents. This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents. Agents can learn heterogeneous SE policies while still maintaining parameter sharing, which leads to reduced cost for learning and storage and enhanced scalability as the number of agents increases. Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios, and performs admirably in immensely complex settings including cooperative tasks and mixed tasks.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Machine Learning -> ML: Reinforcement learning

1213

Enhancing Datalog Reasoning with Hypertree Decompositions

Xinyue Zhang, Pan Hu, Yavor Nenov, Ian Horrocks

[+] More

[-] Less

Datalog reasoning based on the seminaive evaluation strategy evaluates rules using traditional join plans, which often leads to redundancy and inefficiency in practice, especially when the rules are complex. Hypertree decompositions help identify efficient query plans and reduce similar redundancy in query answering. However, it is unclear how this can be applied to materialisation and incremental reasoning with recursive Datalog programs. Moreover, hypertree decompositions require additional data structures and thus introduce nonnegligible overhead in both runtime and memory consumption. In this paper, we provide algorithms that exploit hypertree decompositions for the materialisation and incremental evaluation of Datalog programs. Furthermore, we combine this approach with standard Datalog reasoning algorithms in a modular fashion so that the overhead caused by the decompositions is reduced. Our empirical evaluation shows that, when the program contains complex rules, the combined approach is usually significantly faster than the baseline approach, sometimes by orders of magnitude.

List of keywords

Knowledge Representation and Reasoning -> KRR: Logic programming
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Semantic Web

1219

Federated Graph Semantic and Structural Learning

Wenke Huang, Guancheng Wan, Mang Ye, Bo Du

[+] More

[-] Less

Federated graph learning collaboratively learns a global graph neural network with distributed graphs, where the non-independent and identically distributed property is one of the major challenge. Most relative arts focus on traditional distributed tasks like images and voices, incapable of the graph structures. This paper firstly reveals that local client distortion is brought by both node-level semantics and graph-level structure. First, for node-level semantic, we find that contrasting nodes from distinct classes is beneficial to provide a well-performing discrimination. We pull the local node towards the global node of the same class and push them away from the global node of different classes. Second, we postulate that a well-structural graph neural network possesses similarity for neighbors due to the inherent adjacency relationships. However, aligning each node with adjacent nodes hinders discrimination due to the potential class inconsistency. We transform the adjacency relationships into the similarity distribution and leverage the global model to distill the relation knowledge into the local model, which preserves the structural information and discriminability of the local model. Empirical results on three graph datasets manifest the superiority of the proposed method over counterparts.

List of keywords

Machine Learning -> ML: Federated learning
Machine Learning -> ML: Sequence and graph learning

1220

GPLight: Grouped Multi-agent Reinforcement Learning for Large-scale Traffic Signal Control

Yilin Liu, Guiyang Luo, Quan Yuan, Jinglin Li, Lei Jin, Bo Chen, Rui Pan

[+] More

[-] Less

Multi-agent reinforcement learning (MARL) method is enjoying popularity and prosperity in coordinating traffic lights (CTL), by treating each intersection as an agent. However, existing MARL approaches either treat each agent absolutely homogeneous, i.e., same network and parameter for each agent, or treat each agent completely heterogeneous, i.e., different networks and parameters for each agent. This leads to a difficult balance between accuracy and complexity, especially in large-scale CTL. To address this challenge, we propose a grouped MARL method named GPLight. We first mine the similarity between agent environment considering both real-time traffic flow and static fine-grained road topology. Then we propose two loss functions for maintaining a learnable and dynamical clustering, one applies mutual information estimation for better stability, the other aims to maximize the separability between groups. Finally, GPLight enforces the agents in a group share the same network and parameter. In this way, the cooperation between the same group of agents reduces the complexity, while different groups reflect the difference of the agents to ensure the accuracy. To verify the effectiveness of our method, we conducted experiments on both synthetic and real-world datasets, with up to 1,000 intersections. Compared with state-of-the-art methods, experiment results demonstrate the superiority of our proposed method, especially in large-scale CTL.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Applications
Machine Learning -> ML: Deep reinforcement learning

1223

Augmenting Automated Spectrum Based Fault Localization For Multiple Faults

Prantik Chatterjee, Jose Campos, Rui Abreu, Subhajit Roy

[+] More

[-] Less

Spectrum-based Fault Localization (SBFL) uses the coverage of test cases and their outcome (pass/fail) to predict the "suspiciousness” of program components, e.g., lines of code. SBFL is, perhaps, the most successful fault localization technique due to its simplicity and scalability. However, SBFL heuristics do not perform well in scenarios where a program may have multiple faulty components. In this work, we propose a new algorithm that "augments” previously proposed SBFL heuristics to produce a ranked list where faulty components ranked low by base SBFL metrics are ranked significantly higher. We implement our ideas in a tool, ARTEMIS, that attempts to "bubble up” faulty components which are ranked lower by base SBFL metrics. We compare our technique to the most popular SBFL metrics and demonstrate statistically significant improvement in the developer effort for fault localization with respect to the basic strategies.

List of keywords

Knowledge Representation and Reasoning -> KRR: Diagnosis and abductive reasoning
Multidisciplinary Topics and Applications -> MDA: Software engineering

1224

From Generation to Suppression: Towards Effective Irregular Glow Removal for Nighttime Visibility Enhancement

Wanyu Wu, Wei Wang, Zheng Wang, Kui Jiang, Xin Xu

[+] More

[-] Less

Most existing Low-Light Image Enhancement (LLIE) methods are primarily designed to improve brightness in dark regions, which suffer from severe degradation in nighttime images. However, these methods have limited exploration in another major visibility damage, the glow effects in real night scenes. Glow effects are inevitable in the presence of artificial light sources and cause further diffused blurring when directly enhanced. To settle this issue, we innovatively consider the glow suppression task as learning physical glow generation via multiple scattering estimation according to the Atmospheric Point Spread Function (APSF). In response to the challenges posed by uneven glow intensity and varying source shapes, an APSF-based Nighttime Imaging Model with Near-field Light Sources (NIM-NLS) is specifically derived to design a scalable Light-aware Blind Deconvolution Network (LBDN). The glow-suppressed result is then brightened via a Retinex-based Enhancement Module (REM). Remarkably, the proposed glow suppression method is based on zero-shot learning and does not rely on any paired or unpaired training data. Empirical evaluations demonstrate the effectiveness of the proposed method in both glow suppression and low-light enhancement tasks.

List of keywords

Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Computational photography
Computer Vision -> CV: Segmentation

1231

Rainbow Cycle Number and EFX Allocations: (Almost) Closing the Gap

Shayan Chashm Jahan, Masoud Seddighin, Seyed Mohammad Seyed Javadi, Mohammad Sharifi

[+] More

[-] Less

Recently, some studies on the fair allocation of indivisible goods notice a connection between a purely combinatorial problem called the Rainbow Cycle problem and a fairness notion known as $\efx$: assuming that the rainbow cycle number for parameter $d$ (i.e. $\rainbow(d)$) is $O(d^\beta \log^\gamma d)$, we can find a $(1-\epsilon)$-$\efx$ allocation with $O_{\epsilon}(n^{\frac{\beta}{\beta+1}}\log^{\frac{\gamma}{\beta +1}} n)$ number of discarded goods \cite{chaudhury2021improving}. The best upper bound on $\rainbow(d)$ is improved in a series of works to $O(d^4)$ \cite{chaudhury2021improving}, $O(d^{2+o(1)})$ \cite{berendsohn2022fixed}, and finally to $O(d^2)$ \cite{Akrami2022}.\footnote{We refer to the footnote at the end of the introduction for a short note on the result of \cite{Akrami2022}.} Also, via a simple observation, we have $\rainbow(d) \in \Omega(d)$ \cite{chaudhury2021improving}. In this paper, we introduce another problem in extremal combinatorics. For a parameter $\ell$, we define the rainbow path degree and denote it by $\ech(\ell)$. We show that any lower bound on $\ech(\ell)$ yields an upper bound on $\rainbow(d)$. Next, we prove that $\ech(\ell) \in \Omega(\ell^2/\log n)$ which yields an almost tight upper bound of $\rainbow(d) \in \Omega(d \log d)$. This in turn proves the existence of $(1-\epsilon)$-$\efx$ allocation with $O_{\epsilon}(\sqrt{n \log n})$ number of discarded goods. In addition, for the special case of the Rainbow Cycle problem that the edges in each part form a permutation, we improve the upper bound to $\rainbow(d) \leq 2d-4$. We leverage $\ech(\ell)$ to achieve this bound. Our conjecture is that the exact value of $\ech(\ell) $ is $ \lfloor \frac{\ell^2}{2} \rfloor -1$. We provide some experiments that support this conjecture. Assuming this conjecture is correct, we have $\rainbow(d) \in \Theta(d)$.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Multidisciplinary Topics and Applications -> MDA: Economics

1235

Independent Feature Decomposition and Instance Alignment for Unsupervised Domain Adaptation

Qichen He, Siying Xiao, Mao Ye, Xiatian Zhu, Ferrante Neri, Dongde Hou

[+] More

[-] Less

Existing Unsupervised Domain Adaptation (UDA) methods typically attempt to perform knowledge transfer in a domain-invariant space explicitly or implicitly. In practice, however, the obtained features is often mixed with domain-specific information which causes performance degradation. To overcome this fundamental limitation, this article presents a novel independent feature decomposition and instance alignment method (IndUDA in short). Specifically, based on an invertible flow, we project the base features into a decomposed latent space with domain-invariant and domain-specific dimensions. To drive semantic decomposition independently, we then swap the domain-invariant part across source and target domain samples with the same category and require their inverted features are consistent in class-level with the original features. By treating domain-specific information as noise, we replace it by Gaussian noise and further regularize source model training by instance alignment, i.e., requiring the base features close to the corresponding reconstructed features, respectively. Extensive experiment results demonstrate that our method achieves state-of-the-art performance on popular UDA benchmarks. The appendix and code are available at https://github.com/ayombeach/IndUDA.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

1236

Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees

Daqian Shao, Marta Kwiatkowska

[+] More

[-] Less

Linear Temporal Logic (LTL) is widely used to specify high-level objectives for system policies, and it is highly desirable for autonomous systems to learn the optimal policy with respect to such specifications. However, learning the optimal policy from LTL specifications is not trivial. We present a model-free Reinforcement Learning (RL) approach that efficiently learns an optimal policy for an unknown stochastic system, modelled using Markov Decision Processes (MDPs). We propose a novel and more general product MDP, reward structure and discounting mechanism that, when applied in conjunction with off-the-shelf model-free RL algorithms, efficiently learn the optimal policy that maximizes the probability of satisfying a given LTL specification with optimality guarantees. We also provide improved theoretical results on choosing the key parameters in RL to ensure optimality. To directly evaluate the learned policy, we adopt probabilistic model checker PRISM to compute the probability of the policy satisfying such specifications. Several experiments on various tabular MDP environments across different LTL tasks demonstrate the improved sample efficiency and optimal policy convergence.

List of keywords

Machine Learning -> ML: Reinforcement learning
Planning and Scheduling -> PS: Markov decisions processes
Robotics -> ROB: Learning in robotics

1238

c-TPE: Tree-structured Parzen Estimator with Inequality Constraints for Expensive Hyperparameter Optimization

Shuhei Watanabe, Frank Hutter

[+] More

[-] Less

Hyperparameter optimization (HPO) is crucial for strong performance of deep learning algorithms and real-world applications often impose some constraints, such as memory usage, or latency on top of the performance requirement. In this work, we propose constrained TPE (c-TPE), an extension of the widely-used versatile Bayesian optimization method, tree-structured Parzen estimator (TPE), to handle these constraints. Our proposed extension goes beyond a simple combination of an existing acquisition function and the original TPE, and instead includes modifications that address issues that cause poor performance. We thoroughly analyze these modifications both empirically and theoretically, providing insights into how they effectively overcome these challenges. In the experiments, we demonstrate that c-TPE exhibits the best average rank performance among existing methods with statistical significance on 81 expensive HPO with inequality constraints. Due to the lack of baselines, we only discuss the applicability of our method to hard-constrained optimization in Appendix D. See https://arxiv.org/abs/2211.14411 for the latest version with Appendix.

List of keywords

Machine Learning -> ML: Hyperparameter optimization
Machine Learning -> ML: Automated machine learning

1239

Speeding Up Multi-Objective Hyperparameter Optimization by Task Similarity-Based Meta-Learning for the Tree-Structured Parzen Estimator

Shuhei Watanabe, Noor Awad, Masaki Onishi, Frank Hutter

[+] More

[-] Less

Hyperparameter optimization (HPO) is a vital step in improving performance in deep learning (DL). Practitioners are often faced with the trade-off between multiple criteria, such as accuracy and latency. Given the high computational needs of DL and the growing demand for efficient HPO, the acceleration of multi-objective (MO) optimization becomes ever more important. Despite the significant body of work on meta-learning for HPO, existing methods are inapplicable to MO tree-structured Parzen estimator (MO-TPE), a simple yet powerful MO-HPO algorithm. In this paper, we extend TPE’s acquisition function to the meta-learning setting using a task similarity defined by the overlap of top domains between tasks. We also theoretically analyze and address the limitations of our task similarity. In the experiments, we demonstrate that our method speeds up MO-TPE on tabular HPO benchmarks and attains state-of-the-art performance. Our method was also validated externally by winning the AutoML 2022 competition on “Multiobjective Hyperparameter Optimization for Transformers”. See https://arxiv.org/abs/2212.06751 for the latest version.

List of keywords

Machine Learning -> ML: Hyperparameter optimization
Machine Learning -> ML: Automated machine learning
Machine Learning -> ML: Meta-learning

1241

PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

Shuhei Watanabe, Archit Bansal, Frank Hutter

[+] More

[-] Less

The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this issue, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form calculation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient. See https://arxiv.org/abs/2304.10255 for the latest version.

List of keywords

Machine Learning -> ML: Hyperparameter optimization
Machine Learning -> ML: Automated machine learning

1242

Null-Space Diffusion Sampling for Zero-Shot Point Cloud Completion

Xinhua Cheng, Nan Zhang, Jiwen Yu, Yinhuai Wang, Ge Li, Jian Zhang

[+] More

[-] Less

Point cloud completion aims at estimating the complete data of objects from degraded observations. Despite existing completion methods achieving impressive performances, they rely heavily on degraded-complete data pairs for supervision. In this work, we propose a novel framework named Null-Space Diffusion Sampling (NSDS) to solve the point cloud completion task in a zero-shot manner. By leveraging a pre-trained point cloud diffusion model as the off-the-shelf generator, our sampling approach can generate desired completion outputs with the guidance of the observed degraded data without any extra training. Furthermore, we propose a tolerant loop mechanism to improve the quality of completion results for hard cases. Experimental results demonstrate our zero-shot framework achieves superior completion performance than unsupervised methods and competitive performance to supervised methods in various degraded situations.

List of keywords

Computer Vision -> CV: 3D computer vision

1250

Voice Guard: Protecting Voice Privacy with Strong and Imperceptible Adversarial Perturbation in the Time Domain

Jingyang Li, Dengpan Ye, Long Tang, Chuanxi Chen, Shengshan Hu

[+] More

[-] Less

Adversarial example is a rising tool for voice privacy protection. By adding imperceptible noise to public audio, it prevents tampers from using zero-shot Voice Conversion (VC) to synthesize high quality speech with target speaker identity. However, many existing studies ignore the human perception characteristics of audio data, and it is challenging to generate strong and imperceptible adversarial audio. In this paper, we propose the Voice Guard defense method, which uses a novel method to advance the adversarial perturbation to the time domain to avoid the loss caused by cross-domain conversion. And the psychoacoustic model is introduced into the defense of VC for the first time, which greatly improves the disruption ability and concealment of adversarial audio. We also standardize the evaluation metrics of adversarial audio for the first time, combining multi-dimensional metrics to define the criteria for defense. We evaluate Voice Guard on several state-of-the-art zero-shot VC models. The experimental results show that our method can ensure the perceptual quality of adversarial audio while having a strong defense capability, and is far superior to previous works in terms of disruption ability and concealment.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Security and privacy
Natural Language Processing -> NLP: Speech
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods

1251

Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding

Tianrui Hui, Zihan Ding, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Jiao Dai, Jizhong Han, Si Liu

[+] More

[-] Less

Panoptic narrative grounding (PNG) aims to segment things and stuff objects in an image described by noun phrases of a narrative caption. As a multimodal task, an essential aspect of PNG is the visual-linguistic interaction between image and caption. The previous two-stage method aggregates visual contexts from offline-generated mask proposals to phrase features, which tend to be noisy and fragmentary. The recent one-stage method aggregates only pixel contexts from image features to phrase features, which may incur semantic misalignment due to lacking object priors. To realize more comprehensive visual-linguistic interaction, we propose to enrich phrases with coupled pixel and object contexts by designing a Phrase-Pixel-Object Transformer Decoder (PPO-TD), where both fine-grained part details and coarse-grained entity clues are aggregated to phrase features. In addition, we also propose a Phrase-Object Contrastive Loss (POCL) to pull closer the matched phrase-object pairs and push away unmatched ones for aggregating more precise object contexts from more phrase-relevant object tokens. Extensive experiments on the PNG benchmark show our method achieves new state-of-the-art performance with large margins.

List of keywords

Computer Vision -> CV: Vision and language
Computer Vision -> CV: Segmentation

1252

Latent Processes Identification From Multi-View Time Series

Zenan Huang, Haobo Wang, Junbo Zhao, Nenggan Zheng

[+] More

[-] Less

Understanding the dynamics of time-series data typically requires identifying the unique latent factors for data generation, a.k.a, latent processes identification. Driven by the independent assumption, existing works have made great progress in handling single-view data. However, it is a non-trivial problem that extends them to multi-view time-series data because of two main challenges: (i) the complex data structure, such as temporal dependency, can result in violation of the independent assumption; (ii) the factors from different views are generally overlapped and are hard to be aggregated to a complete set. In this work, we propose a novel framework MuLTI that employs the contrastive learning technique to invert the data generative process for enhanced identifiability. Additionally, MuLTI integrates a permutation mechanism that merges corresponding overlapped variables by the establishment of an optimal transport formula. Extensive experimental results on synthetic and real-world datasets demonstrate the superiority of our method in recovering identifiable latent variables on multi-view time series.

List of keywords

Machine Learning -> ML: Causality
Machine Learning -> ML: Multi-view learning

1265

Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering

Abhirama Subramanyam Penamakuri, Manish Gupta, Mithun Gupta, Anand Mishra

[+] More

[-] Less

We study visual question answering in a setting where the answer has to be mined from a pool of relevant and irrelevant images given as a context. For such a setting, a model must first retrieve relevant images from the pool and answer the question from these retrieved images. We refer to this problem as retrieval-based visual question answering (or RETVQA in short). The RETVQA is distinctively different and more challenging than the traditionally studied Visual Question Answering (VQA), where a given question has to be answered with a single relevant image in context. Towards solving the RETVQA task, we propose a unified Multi Image BART (MI-BART) that takes a question and retrieved images using our relevance encoder for free-form fluent answer generation. Further, we introduce the largest dataset in this space, namely RETVQA, which has the following salient features: multi-image and retrieval requirement for VQA, metadata-independent questions over a pool of heterogeneous images, expecting a mix of classification-oriented and open-ended generative answers. Our proposed framework achieves an accuracy of 76.5% and a fluency of 79.3% on the proposed dataset RETVQA and also outperforms state-of-the-art methods by 4.9% and 11.8% on the image segment of the publicly available WebQA dataset on the accuracy and fluency metrics, respectively.

List of keywords

Computer Vision -> CV: Vision and language
Computer Vision -> CV: Applications
Machine Learning -> ML: Multi-modal learning

1268

Towards Semantics- and Domain-Aware Adversarial Attacks

Jianping Zhang, Yung-chieh Huang, Weibin Wu, Michael Lyu

[+] More

[-] Less

Language models are known to be vulnerable to textual adversarial attacks, which add human-imperceptible perturbations to the input to mislead DNNs. It is thus imperative to devise effective attack algorithms to identify the deficiencies of DNNs before real-world deployment. However, existing word-level attacks have two major deficiencies: (1) They may change the semantics of the original sentence. (2) The generated adversarial sample can appear unnatural to humans due to the introduction of out-of-domain substitute words. In this paper, to address such drawbacks, we propose a semantics- and domain-aware word-level attack method. Specifically, we greedily replace the important words in a sentence with the ones suggested by a language model. The language model is trained to be semantics- and domain-aware via contrastive learning and in-domain pre-training. Furthermore, to balance the quality of adversarial examples and the attack success rate, we propose an iterative updating framework to optimize the contrastive learning loss and the in-domain pre-training loss in circular order. Comprehensive experimental comparisons confirm the superiority of our approach. Notably, compared with state-of-the-art benchmarks, our strategy can achieve over 3\% improvement in attack success rates and 9.8\% improvement in the quality of adversarial examples.

List of keywords

Natural Language Processing -> NLP: Interpretability and analysis of models for NLP

1272

Scaling Goal-based Exploration via Pruning Proto-goals

Akhil Bagaria, Tom Schaul

[+] More

[-] Less

One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short. Goal-directed, purposeful behaviours are able to overcome this, but rely on a good goal space. The core challenge in goal discovery is finding the right balance between generality (not hand-crafted) and tractability (useful, not too many). Our approach explicitly seeks the middle ground, enabling the human designer to specify a vast but meaningful proto-goal space, and an autonomous discovery process to refine this to a narrower space of controllable, reachable, novel, and relevant goals. The effectiveness of goal-conditioned exploration with the latter is then demonstrated in three challenging environments.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Deep reinforcement learning

1274

SMARTformer: Semi-Autoregressive Transformer with Efficient Integrated Window Attention for Long Time Series Forecasting

Yiduo Li, Zhongwen Rao, Zhe Li, Shiyi Qi, Lujia Pan, Zenglin Xu

[+] More

[-] Less

Transformers have achieved remarkable performance in long time series forecasting (LTSF), thanks to their powerful capture of long-range dependencies. However, the prediction on long time sequences has been significantly affected by the ability of capturing reliable local dependencies in segments of sequences. To address this issue, we introduce the SMARTformer denoting SeMiAutoRegressive Transformer with Efficient Integrated Window Attention. In detail, the semi-autoregressive (SAR) decoder first predicts each segment of the sequence iteratively to comprehensively capture local context in a way as autoregressive (AR) decoding; based on the previous output, it then refines the whole sequence in a non-autoregressive (NAR) way. Therefore, SAR benefits from both the global horizon of NAR and local detail capturing of AR. Moreover, it can be used as a general plug-in to further enhance the predicting performance of various transformer models on time series. Furthermore, to achieve complementary clues in local and enlarged receptive fields, we propose the Integrated Window Attention to separately conduct both local self-attention in multi-scale windows and global attention across windows. Especially, with a linear complexity, this design also brings significant improvement in computational efficiency. Finally, extensive studies on five benchmark datasets show the effectiveness of SMARTformer against SOTA works, with an improvement of 10.2% and 18.4% in multivariate and univariate long-term forecasting, respectively.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Machine Learning -> ML: Regression
Machine Learning -> ML: Time series and data streams

1310

Fairness via Group Contribution Matching

Tianlin Li, Zhiming Li, Anran Li, Mengnan Du, Aishan Liu, Qing Guo, Guozhu Meng, Yang Liu

[+] More

[-] Less

Fairness issues in Deep Learning models have recently received increasing attention due to their significant societal impact. Although methods for mitigating unfairness are constantly proposed, little research has been conducted to understand how discrimination and bias develop during the standard training process. In this study, we propose analyzing the contribution of each subgroup (i.e., a group of data with the same sensitive attribute) in the training process to understand the cause of such bias development process. We propose a gradient-based metric to assess training subgroup contribution disparity, showing that unequal contributions from different subgroups are one source of such unfairness. One way to balance the contribution of each subgroup is through oversampling, which ensures that an equal number of samples are drawn from each subgroup during each training iteration. However, we have found that even with a balanced number of samples, the contribution of each group remains unequal, resulting in unfairness under the oversampling strategy. To address the above issues, we propose an easy but effective group contribution matching (GCM) method to match the contribution of each subgroup. Our experiments show that our GCM effectively improves fairness and outperforms other methods significantly.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity

1327

Diagram Visual Grounding: Learning to See with Gestalt-Perceptual Attention

Xin Hu, Lingling Zhang, Jun Liu, Xinyu Zhang, Wenjun Wu, Qianying Wang

[+] More

[-] Less

Diagram visual grounding aims to capture the correlation between language expression and local objects in the diagram, and plays an important role in the applications like textbook question answering and cross-modal retrieval. Most diagrams consist of several colors and simple geometries. This results in sparse low-level visual features, which further aggravates the gap between low-level visual and high-level semantic features of diagrams. The phenomenon brings challenges to the diagram visual grounding. To solve the above issues, we propose a gestalt-perceptual attention model to align the diagram objects and language expressions. For low-level visual features, inspired by the gestalt that simulates human visual system, we build a gestalt-perception graph network to make up the features learned by the traditional backbone network. For high-level semantic features, we design a multi-modal context attention mechanism to facilitate the interaction between diagrams and language expressions, so as to enhance the semantics of diagrams. Finally, guided by diagram features and linguistic embedding, the target query is gradually decoded to generate the coordinates of the referred object. By conducting comprehensive experiments on diagrams and natural images, we demonstrate that the proposed model achieves superior performance over the competitors. Our code will be released at https://github.com/AIProCode/GPA.

List of keywords

Computer Vision -> CV: Vision and language
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning

1340

From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping

Junyang Wang, Ming Yan, Yi Zhang, Jitao Sang

[+] More

[-] Less

With the development of Vision-Language Pre-training Models (VLPMs) represented by CLIP and ALIGN, significant breakthroughs have been achieved for association-based visual tasks such as image classification and image-text retrieval by the zero-shot capability of CLIP without fine-tuning. However, CLIP is hard to apply to generation-based tasks. This is due to the lack of decoder architecture and pre-training tasks for generation. Although previous works have created generation capacity for CLIP through additional language models, a modality gap between the CLIP representations of different modalities and the inability of CLIP to model the offset of this gap, which results in the failure of the concept to transfer across modes. To solve the problem, we try to map images/videos to the language modality and generate captions from the language modality. In this paper, we propose the K-nearest-neighbor Cross-modality Mapping (Knight), a zero-shot method from association to generation. With vision-free unsupervised training, Knight achieves state-of-the-art performance in zero-shot methods for image captioning and video captioning.

List of keywords

Machine Learning -> ML: Multi-modal learning
Computer Vision -> CV: Vision and language
Natural Language Processing -> NLP: Language generation

1361

HDFormer: High-order Directed Transformer for 3D Human Pose Estimation

Hanyuan Chen, Jun-Yan He, Wangmeng Xiang, Wei Liu, Zhi-Qi Cheng, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie

[+] More

[-] Less

Human pose estimation is a complicated structured data sequence modeling task. Most existing methods only consider the pair-wise interaction of human body joints in model learning. Unfortunately, this causes 3D pose estimation to fail in difficult cases such as \textit{joints overlapping}, and pose \textit{fast-changing}, as pair-wise relations cannot exploit fine-grained human body priors in pose estimation. To this end, we revamped the 3D pose estimation framework with a \textit{\textbf{H}igh-order} \textit{\textbf{D}irected} \textit{Transformer} (HDFormer), which coherently exploits the high-order relevances to boost the performance of pose estimation. Specifically, HDFormer adopts both self-attention and high-order attention schemes to build up a multi-order attention module to perform the information flow interaction including the first-order “\textit{joint$\leftrightarrow$joint}", second-order “\textit{bone$\leftrightarrow$joint}" as well as high-order “\textit{hyperbone$\leftrightarrow$joint}" relationships (hyperbone is defined as a joint set), compensating the hard cases prediction in fast-changing and heavy occlusion scenarios. Moreover, modernized CNN techniques are applied to upgrade the transformer-based architecture to speed up the HDFormer, achieving a favorable trade-off between effectiveness and efficiency. We compare our model with other SOTA models on the datasets Human3.6M and MPI-INF-3DHP. The results demonstrate that the proposed HDFormer achieves superior performance with only \textbf{1/10} parameters and much lower computational cost compared to the current SOTAs. Moreover, HDFormer can be applied to various types of real-world applications, enabling real-time and accurate 3D pose estimation \footnote{The source code is in https://shorturl.at/aISY0}.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Video analysis and understanding

1363

Artificial Agents Inspired by Human Motivation Psychology for Teamwork in Hazardous Environments

Anupama Arukgoda, Erandi Lakshika, Michael Barlow, Kasun Gunawardana

[+] More

[-] Less

Multi-agent literature explores personifying artificial agents with personality, emotions or cognitive biases to produce “typical”, believable agents. In this study, we demonstrate the potential of endowing artificial agents with a motivation, using human implicit motivation psychology theory that introduces 3 motive profiles – power, achievement and affiliation, to create diverse, risk-aware agents. We first devise a framework to model these motivated agents (or agents with any inherent behavior), that can activate different strategies depending on the circumstances. We conduct experiments on a fire-fighting task domain, evaluate how motivated teams perform, and draw conclusions on appropriate team compositions to be deployed in environments with different risk levels. Our framework generates predictable agents as their resulting behaviors align with the inherent characteristics of their motives. We find that motivational diversity within teams is beneficial in dynamic collaborative environments, especially as the task risk level increases. Furthermore, we observed that the best composition in terms of the performance metrics used to evaluate team compositions, does not remain the same as the collaboration level required to achieve goals changes. These results have implications for future designs of risk-aware autonomous teams and Human-AI teams, as they highlight the prospects of creating better artificial teammates and performance gains that could be achieved through anthropomorphized motivated agents.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Human-AI collaboration

1378

End-to-End Combinatorial Ensemble Learning

James Kotary, Vincenzo Di Vito Francesco, Ferdinando Fioretto

[+] More

[-] Less

Ensemble learning is an important class of algorithms aimed at creating accurate and robust machine learning models by combining predictions from individual models. A key challenge in designing these algorithms is to find effective ways to combine the individual predictions for any particular input sample. This paper addresses this challenge and proposes an integration of constrained optimization and learning to derive specialized consensus rules. The resulting strategy learns to select appropriate predictors to combine for a particular input sample. The paper shows how to derive the ensemble learning task into a differentiable selection program which is trained end-to-end within the ensemble learning model. Results over several benchmarks, demonstrate the ability of the proposed solution to substantially outperform common and advanced consensus rules in a variety of settings and learning tasks.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Machine Learning -> ML: Applications

1379

Folded Optimization in End-to-End Learning

James Kotary, My Dinh, Ferdinando Fioretto

[+] More

[-] Less

The integration of constrained optimization models in deep learning has led to promising advances in both the machine learning and optimization domains. When an optimization problem has some undefined parameters, it can be viewed as a function that maps those parameters to corresponding optimal solutions. Such mappings are useful in learning constrained representations, for tasks that require special structure in the predictions or feature embeddings of a machine learning model. A primary challenge in learning with these integrated models is backpropagation through optimization mapping, which typically lacks a closed form. A common approach is unrolling, which relies on automatic differentiation through the operations of an iterative solver. While flexible and general, unrolling can encounter accuracy and efficiency issues in practice. These issues can be avoided by differentiating the optimization mapping analytically, but current frameworks impose rigid requirements on the optimization problem’s form. This paper provides theoretical insights into the backpropagation of unrolled optimizers, which lead to equivalent but efficiently solvable analytical models. Theoretically, it proposes a unifying view of unrolling and analytical differentiation through constrained optimization mappings.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Machine Learning -> ML: Applications
Machine Learning -> ML: Optimization

1384

Relation-enhanced DETR for Component Detection in Graphic Design Reverse Engineering

Xixuan Hao, Danqing Huang, Jieru Lin, Chin-Yew Lin

[+] More

[-] Less

It is a common practice for designers to create digital prototypes from a mock-up/screenshot. Reverse engineering graphic design by detecting its components (e.g., text, icon, button) helps expedite this process. This paper first conducts a statistical analysis to emphasize the importance of relations in graphic layouts, which further motivates us to incorporate relation modeling into component detection. Built on the current state-of-the-art DETR (DEtection TRansformer), we introduce a learnable relation matrix to model class correlations. Specifically, the matrix will be added in the DETR decoder to update the query-to-query self-attention. Experiment results on three public datasets show that our approach achieves better performance than several strong baselines. We further visualize the learnt relation matrix and observe some reasonable patterns. Moreover, we show an application of component detection where we leverage the detection outputs as augmented training data for layout generation, which achieves promising results.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Arts and creativity
Computer Vision -> CV: Applications
Computer Vision -> CV: Recognition (object detection, categorization)

1390

Reducing Communication for Split Learning by Randomized Top-k Sparsification

Fei Zheng, Chaochao Chen, Lingjuan Lyu, binhui yao

[+] More

[-] Less

Split learning is a simple solution for Vertical Federated Learning (VFL), which has drawn substantial attention in both research and application due to its simplicity and efficiency. However, communication efficiency is still a crucial issue for split learning. In this paper, we investigate multiple communication reduction methods for split learning, including cut layer size reduction, top-$k$ sparsification, quantization, and L1 regularization. Through analysis of the cut layer size reduction and top-$k$ sparsification, we further propose randomized top-$k$ sparsification, to make the model generalize and converge better. This is done by selecting top-$k$ elements with a large probability while also having a small probability to select non-top-$k$ elements. Empirical results show that compared with other communication-reduction methods, our proposed randomized top-$k$ sparsification achieves a better model performance under the same compression level.

List of keywords

Machine Learning -> ML: Federated learning
Machine Learning -> ML: Learning sparse models
Multidisciplinary Topics and Applications -> MDA: Security and privacy

1412

Towards Generalizable Reinforcement Learning for Trade Execution

Chuheng Zhang, Yitong Duan, Xiaoyu Chen, Jianyu Chen, Jian Li, Li Zhao

[+] More

[-] Less

Optimized trade execution is to sell (or buy) a given amount of assets in a given time with the lowest possible trading cost. Recently, reinforcement learning (RL) has been applied to optimized trade execution to learn smarter policies from market data. However, we find that many existing RL methods exhibit considerable overfitting which prevents them from real deployment. In this paper, we provide an extensive study on the overfitting problem in optimized trade execution. First, we model the optimized trade execution as offline RL with dynamic context (ORDC), where the context represents market variables that cannot be influenced by the trading policy and are collected in an offline manner. Under this framework, we derive the generalization bound and find that the overfitting issue is caused by large context space and limited context samples in the offline setting. Accordingly, we propose to learn compact representations for context to address the overfitting problem, either by leveraging prior knowledge or in an end-to-end manner. To evaluate our algorithms, we also implement a carefully designed simulator based on historical limit order book (LOB) data to provide a high-fidelity benchmark for different algorithms. Our experiments on the high-fidelity simulator demonstrate that our algorithms can effectively alleviate overfitting and achieve better performance.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Finance
Machine Learning -> ML: Deep reinforcement learning

1424

Dynamic Group Link Prediction in Continuous-Time Interaction Network

Shijie Luo, He Li, Jianbin Huang

[+] More

[-] Less

Recently, group link prediction has received increasing attention due to its important role in analyzing relationships between individuals and groups. However, most existing group link prediction methods emphasize static settings or only make cursory exploitation of historical information, so they fail to obtain good performance in dynamic applications. To this end, we attempt to solve the group link prediction problem in continuous-time dynamic scenes with fine-grained temporal information. We propose a novel continuous-time group link prediction method CTGLP to capture the patterns of future link formation between individuals and groups. A new graph neural network CTGNN is presented to learn the latent representations of individuals by biasedly aggregating neighborhood information. Moreover, we design an importance-based group modeling function to model the embedding of a group based on its known members. CTGLP eventually learns a probability distribution and predict the link target. Experimental results on various datasets with and without unseen nodes show that CTGLP outperforms the state-of-the-art methods by 13.4% and 13.2% on average.

List of keywords

Data Mining -> DM: Mining graphs
Data Mining -> DM: Networks
Multidisciplinary Topics and Applications -> MDA: Social sciences

1426

Contrastive Learning and Reward Smoothing for Deep Portfolio Management

Yun-Hsuan Lien, Yuan-Kui Li, Yu-Shuen Wang

[+] More

[-] Less

In this study, we used reinforcement learning (RL) models to invest assets in order to earn returns. The models were trained to interact with a simulated environment based on historical market data and learn trading strategies. However, using deep neural networks based on the returns of each period can be challenging due to the unpredictability of financial markets. As a result, the policies learned from training data may not be effective when tested in real-world situations. To address this issue, we incorporated contrastive learning and reward smoothing into our training process. Contrastive learning allows the RL models to recognize patterns in asset states that may indicate future price movements. Reward smoothing, on the other hand, serves as a regularization technique to prevent the models from seeking immediate but uncertain profits. We tested our method against various traditional financial techniques and other deep RL methods, and found it to be effective in both the U.S. stock market and the cryptocurrency market. Our source code will be made available for public access.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Relational learning
Machine Learning -> ML: Representation learning

1441

ViT-P3DE∗: Vision Transformer Based Multi-Camera Instance Association with Pseudo 3D Position Embeddings

Minseok Seo, Hyuk-Jae Lee, Xuan Truong Nguyen

[+] More

[-] Less

Multi-camera instance association, which identifies identical objects among multiple objects in multi-view images, is challenging due to several harsh constraints. To tackle this problem, most studies have employed CNNs as feature extractors but often fail under such harsh constraints. Inspired by Vision Transformer (ViT), we first develop a pure ViT-based framework for robust feature extraction through self-attention and residual connection. We then propose two novel methods to achieve robust feature learning. First, we introduce learnable pseudo 3D position embeddings (P3DEs) that represent the 3D location of an object in the world coordinate system, which is independent of the harsh constraints. To generate P3DEs, we encode the camera ID and the object’s 2D position in the image using embedding tables. We then build a framework that trains P3DEs to represent an object’s 3D position in a weakly supervised manner. Second, we also utilize joint patch generation (JPG). During patch generation, JPG considers an object and its surroundings as a single input patch to reinforce the relationship information between two features. Ultimately, experimental results demonstrate that both ViT-P3DE and ViT-P3DE with JPG achieve state-of-the-art performance and significantly outperform existing works, especially when dealing with extremely harsh constraints.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Recognition (object detection, categorization)

1442

Black-Box Data Poisoning Attacks on Crowdsourcing

Pengpeng Chen, Yongqiang Yang, Dingqi Yang, Hailong Sun, Zhijun Chen, Peng Lin

[+] More

[-] Less

Understanding the vulnerability of label aggregation against data poisoning attacks is key to ensuring data quality in crowdsourced label collection. State-of-the-art attack mechanisms generally assume full knowledge of the aggregation models while failing to consider the flexibility of malicious workers in selecting which instances to label. Such a setup limits the applicability of the attack mechanisms and impedes further improvement of their success rate. This paper introduces a black-box data poisoning attack framework that finds the optimal strategies for instance selection and labeling to attack unknown label aggregation models in crowdsourcing. We formulate the attack problem on top of a generic formalization of label aggregation models and then introduce a substitution approach that attacks a substitute aggregation model in replacement of the unknown model. Through extensive validation on multiple real-world datasets, we demonstrate the effectiveness of both instance selection and model substitution in improving the success rate of attacks.

List of keywords

Humans and AI -> HAI: Human-AI collaboration
Humans and AI -> HAI: Human computation and crowdsourcing
Machine Learning -> ML: Robustness

1456

Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and Beyond

Zhu Liu, Jinyuan Liu, Guanyao Wu, Long Ma, Xin Fan, Risheng Liu

[+] More

[-] Less

Recently, multi-modality scene perception tasks, e.g., image fusion and scene understanding, have attracted widespread attention for intelligent vision systems. However, early efforts always consider boosting a single task unilaterally and neglecting others, seldom investigating their underlying connections for joint promotion. To overcome these limitations, we establish the hierarchical dual tasks-driven deep model to bridge these tasks. Concretely, we firstly construct an image fusion module to fuse complementary characteristics and cascade dual task-related modules, including a discriminator for visual effects and a semantic network for feature measurement. We provide a bi-level perspective to formulate image fusion and follow-up downstream tasks. To incorporate distinct task-related responses for image fusion, we consider image fusion as a primary goal and dual modules as learnable constraints. Furthermore, we develop an efficient first-order approximation to compute corresponding gradients and present dynamic weighted aggregation to balance the gradients for fusion learning. Extensive experiments demonstrate the superiority of our method, which not only produces visually pleasant fused results but also realizes significant promotion for detection and segmentation than the state-of-the-art approaches.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Scene analysis and understanding

1461

SWAT: Spatial Structure Within and Among Tokens

Kumara Kahatapitiya, Michael Ryoo

[+] More

[-] Less

Modeling visual data as tokens (i.e., image patches) using attention mechanisms, feed-forward networks or convolutions has been highly effective in recent years. Such methods usually have a common pipeline: a tokenization method, followed by a set of layers/blocks for information mixing, both within and among tokens. When image patches are converted into tokens, they are often flattened, discarding the spatial structure within each patch. As a result, any processing that follows (eg: multi-head self-attention) may fail to recover and/or benefit from such information. In this paper, we argue that models can have significant gains when spatial structure is preserved during tokenization, and is explicitly used during the mixing stage. We propose two key contributions: (1) Structure-aware Tokenization and, (2) Structure-aware Mixing, both of which can be combined with existing models with minimal effort. We introduce a family of models (SWAT), showing improvements over the likes of DeiT, MLP-Mixer and Swin Transformer, across multiple benchmarks including ImageNet classification and ADE20K segmentation. Our code and models will be released online.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning

1476

Multi-objective Optimization-based Selection for Quality-Diversity by Non-surrounded-dominated Sorting

Ren-Jian Wang, Ke Xue, Haopu Shang, Chao Qian, Haobo Fu, Qiang Fu

[+] More

[-] Less

Quality-Diversity (QD) algorithms, a subset of evolutionary algorithms, maintain an archive (i.e., a set of solutions) and simulate the natural evolution process through iterative selection and reproduction, with the goal of generating a set of high-quality and diverse solutions. Though having found many successful applications in reinforcement learning, QD algorithms often select the parent solutions uniformly at random, which lacks selection pressure and may limit the performance. Recent studies have treated each type of behavior of a solution as an objective, and selected the parent solutions based on Multi-objective Optimization (MO), which is a natural idea, but has not lead to satisfactory performance as expected. This paper gives the reason for the first time, and then proposes a new MO-based selection method by non-surrounded-dominated sorting (NSS), which considers all possible directions of the behaviors, and thus can generate diverse solutions over the whole behavior space. By combining NSS with the most widespread QD algorithm, MAP-Elites, we perform experiments on synthetic functions and several complex tasks (i.e., QDGym, robotic arm, and Mario environment generation), showing that NSS achieves better performance than not only other MO-based selection methods but also state-of-the-art selection methods in QD.

List of keywords

Machine Learning -> ML: Evolutionary learning
Machine Learning -> ML: Reinforcement learning
Search -> S: Evolutionary computation

1479

Stability and Generalization of $\ell_p$-Regularized Stochastic Learning for GCN

Shaogao Lv, Shiyu Liu, linsen Wei, Ming Li

[+] More

[-] Less

Graph convolutional networks (GCN) are viewed as one of most popular representations among the variants of graph neural networks over graph data, and have shown powerful performance on empirical experiments. Those $\ell_2$-based graph smoothing enforces global smoothness of GCN, while (soft) $\ell_1$-based sparse graph learning tends to promote signal sparsity to trade for discontinuity. The current paper aims at quantifying the trade off of GCN between smoothness and sparsity, with the help of a general $\ell_p$-regularized $(1<p\leq 2)$ stochastic learning proposed in the paper. While stability-based generalization analysis have been given in prior work for a second derivative objectiveness function, our $\ell_p$-regularized learning scheme does not satisfy such a smooth condition. To address this issue, we propose a novel SGD proximal algorithm for GCN with an inexact operator. For a single-layer GCN, we establish an explicit theoretical understanding of GCN with the $\ell_p$-regularized stochastic learning by analyzing the stability of our SGD proximal algorithm. Several empirical experiments are implemented to validate our theoretical findings.

List of keywords

Uncertainty in AI -> UAI: Graphical models

1493

On the Reuse Bias in Off-Policy Reinforcement Learning

Chengyang Ying, Zhongkai Hao, Xinning Zhou, Hang Su, Dong Yan, Jun Zhu

[+] More

[-] Less

Importance sampling (IS) is a popular technique in off-policy evaluation, which re-weights the return of trajectories in the replay buffer to boost sample efficiency. However, training with IS can be unstable and previous attempts to address this issue mainly focus on analyzing the variance of IS. In this paper, we reveal that the instability is also related to a new notion of Reuse Bias of IS — the bias in off-policy evaluation caused by the reuse of the replay buffer for evaluation and optimization. We theoretically show that the off-policy evaluation and optimization of the current policy with the data from the replay buffer result in an overestimation of the objective, which may cause an erroneous gradient update and degenerate the performance. We further provide a high-probability upper bound of the Reuse Bias and show that controlling one term of the upper bound can control the Reuse Bias by introducing the concept of stability for off-policy algorithms. Based on these analyses, we present a novel yet simple Bias-Regularized Importance Sampling (BIRIS) framework along with practical algorithms, which can alleviate the negative impact of the Reuse Bias, and show that our BIRIS can significantly reduce the Reuse Bias empirically. Moreover, extensive experimental results show that our BIRIS-based methods can significantly improve the sample efficiency on a series of continuous control tasks in MuJoCo.

List of keywords

Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Deep reinforcement learning
Planning and Scheduling -> PS: Markov decisions processes

1505

Clustered-patch Element Connection for Few-shot Learning

Jinxiang Lai, Siqian Yang, Junhong Zhou, Wenlong Wu, Xiaochen Chen, Jun Liu, Bin-Bin Gao, Chengjie Wang

[+] More

[-] Less

Weak feature representation problem has influenced the performance of few-shot classification task for a long time. To alleviate this problem, recent researchers build connections between support and query instances through embedding patch features to generate discriminative representations. However, we observe that there exists semantic mismatches (foreground/ background) among these local patches, because the location and size of the target object are not fixed. What is worse, these mismatches result in unreliable similarity confidences, and complex dense connection exacerbates the problem. According to this, we propose a novel Clustered-patch Element Connection (CEC) layer to correct the mismatch problem. The CEC layer leverages Patch Cluster and Element Connection operations to collect and establish reliable connections with high similarity patch features, respectively. Moreover, we propose a CECNet, including CEC layer based attention module and the CEC based distance metric. The former is utilized to generate a more discriminative representation benefiting from the global clustered-patch features, and the latter s introduced to reliably measure the similarity between pair-features. Extensive experiments demonstrate that our CECNet outperforms the state-of-the-art methods on multiple classification benchmark datasets. Furthermore, our CEC approach can be extended into few-shot segmentation and detection tasks and achieves competitive improvements.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

1510

Dual-view Correlation Hybrid Attention Network for Robust Holistic Mammogram Classification

Zhiwei Wang, Junlin Xian, Kangyi Liu, Xin Li, Qiang Li, Xin Yang

[+] More

[-] Less

Mammogram image is important for breast cancer screening, and typically obtained in a dual-view form, i.e., cranio-caudal (CC) and mediolateral oblique (MLO), to provide complementary information for clinical decisions. However, previous methods mostly learn features from the two views independently, which violates the clinical knowledge and ignores the importance of dual-view correlation in the feature learning. In this paper, we propose a dual-view correlation hybrid attention network (DCHA-Net) for robust holistic mammogram classification. Specifically, DCHA-Net is carefully designed to extract and reinvent deep feature maps for the two views, and meanwhile to maximize the underlying correlations between them. A hybrid attention module, consisting of local relation and non-local attention blocks, is proposed to alleviate the spatial misalignment of the paired views in the correlation maximization. A dual-view correlation loss is introduced to maximize the feature similarity between corresponding strip-like regions with equal distance to the chest wall, motivated by the fact that their features represent the same breast tissues, and thus should be highly-correlated with each other. Experimental results on the two public datasets, i.e., INbreast and CBIS-DDSM, demonstrate that the DCHA-Net can well preserve and maximize feature correlations across views, and thus outperforms previous state-of-the-art methods for classifying a whole mammogram as malignant or not.

List of keywords

Computer Vision -> CV: Biomedical image analysis
Machine Learning -> ML: Knowledge-aided learning
Machine Learning -> ML: Multi-view learning

1529

Image Composition with Depth Registration

Zan Li, Wencheng Wang, Fei Hou

[+] More

[-] Less

Handling occlusions is still a challenging problem for image composition. It always requires the source contents to be completely in front of the target contents or needs manual interventions to adjust occlusions, which is very tedious. Though several methods have suggested exploiting priors or learning techniques for promoting occlusion determination, their potentials are much limited. This paper addresses the challenge by presenting a depth registration method for merging the source contents seamlessly into the 3D space that the target image represents. Thus, the occlusions between the source contents and target contents can be conveniently handled through pixel-wise depth comparisons, allowing the user to more efficiently focus on the designs for image composition. Experimental results show that we can conveniently handle occlusions in image composition and improve efficiency by about 4 times compared to Photoshop.

List of keywords

Computer Vision -> CV: Scene analysis and understanding
Multidisciplinary Topics and Applications -> MDA: Arts and creativity

1539

IID-GAN: an IID Sampling Perspective for Regularizing Mode Collapse

Yang Li, Liangliang Shi, Junchi Yan

[+] More

[-] Less

Despite its success, generative adversarial networks (GANs) still suffer from mode collapse, i.e., the generator can only map latent variables to a partial set of modes in the target distribution. In this paper, we analyze and seek to regularize this issue with an independent and identically distributed (IID) sampling perspective and emphasize that holding the IID property referring to the target distribution for generation can naturally avoid mode collapse. This is based on the basic IID assumption for real data in machine learning. However, though the source samples {z} obey IID, the generations {G(z)} may not necessarily be IID sampling from the target distribution. Based on this observation, considering a necessary condition of IID generation, we propose a new loss to encourage the closeness between the inverse samples of real data and the Gaussian source in the latent space to regularize the generation to be IID from the target distribution. The logic is that the inverse samples from target data should also be IID in the source distribution. Experiments on both synthetic and real-world data show the effectiveness of our model.

List of keywords

Machine Learning -> ML: Generative adverserial networks
Computer Vision -> CV: Neural generative models, auto encoders, GANs

1540

KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot Node Classification

Likang Wu, Junji Jiang, Hongke Zhao, Hao Wang, Defu Lian, Mengdi Zhang, Enhong Chen

[+] More

[-] Less

Recently, Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis. This task aims to predict nodes from unseen classes which are unobserved in the training process. Existing work mainly utilizes Graph Neural Networks (GNNs) to associate features’ prototypes and labels’ semantics thus enabling knowledge transfer from seen to unseen classes. However, the multi-faceted semantic orientation in the feature-semantic alignment has been neglected by previous work, i.e. the content of a node usually covers diverse topics that are relevant to the semantics of multiple labels. It’s necessary to separate and judge the semantic factors that tremendously affect the cognitive ability to improve the generality of models. To this end, we propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics via the extracted KG (Knowledge Graph)-based topics. And then the content of each node is reconstructed to a topic-level representation that offers multi-faceted and fine-grained semantic relevancy to different labels. Due to the particularity of the graph’s instance (i.e., node) representation, a novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation. Finally, we conduct extensive experiments on several public graph datasets and design an application of zero-shot cross-domain recommendation. The quantitative results demonstrate both the effectiveness and generalization of KMF with the comparison of state-of-the-art baselines.

List of keywords

Data Mining -> DM: Applications
Data Mining -> DM: Knowledge graphs and knowledge base completion
Data Mining -> DM: Mining graphs

1542

Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration For Video Captioning

Liqi Yan, Cheng Han, Zenglin Xu, Dongfang Liu, Qifan Wang

[+] More

[-] Less

Fine-tuning large vision-language models is a challenging task. Prompt tuning approaches have been introduced to learn fixed textual or visual prompts while freezing the pre-trained model in downstream tasks. Despite the effectiveness of prompt tuning, what do those learnable prompts learn remains unexplained. In this work, we explore whether prompts in the fine-tuning can learn knowledge-aware prompts from the pre-training, by designing two different sets of prompts in pre-training and fine-tuning phases respectively. Specifically, we present a Video-Language Prompt tuning (VL-Prompt) approach for video captioning, which first efficiently pre-train a video-language model to extract key information (e.g., actions and objects) with flexibly generated Knowledge-Aware Prompt (KAP). Then, we design a Video-Language Prompt (VLP) to transfer the knowledge from the knowledge-aware prompts and fine-tune the model to generate full captions. Experimental results show the superior performance of our approach over several state-of-the-art baselines. We further demonstrate that the video-language prompts are well learned from the knowledge-aware prompts.

List of keywords

Computer Vision -> CV: Video analysis and understanding
Computer Vision -> CV: Vision and language

1560

Abstraction of Nondeterministic Situation Calculus Action Theories

Bita Banihashemi, Giuseppe De Giacomo, Yves Lesperance

[+] More

[-] Less

We develop a general framework for abstracting the behavior of an agent that operates in a nondeterministic domain, i.e., where the agent does not control the outcome of the nondeterministic actions, based on the nondeterministic situation calculus and the ConGolog programming language. We assume that we have both an abstract and a concrete nondeterministic basic action theory, and a refinement mapping which specifies how abstract actions, decomposed into agent actions and environment reactions, are implemented by concrete ConGolog programs. This new setting supports strategic reasoning and strategy synthesis, by allowing us to quantify separately on agent actions and environment reactions. We show that if the agent has a (strong FOND) plan/strategy to achieve a goal/complete a task at the abstract level, and it can always execute the nondeterministic abstract actions to completion at the concrete level, then there exist a refinement of it that is a (strong FOND) plan/strategy to achieve the refinement of the goal/task at the concrete level.

List of keywords

Knowledge Representation and Reasoning -> KRR: Reasoning about actions
Agent-based and Multi-agent Systems -> MAS: Agent theories and models

1572

Overlooked Implications of the Reconstruction Loss for VAE Disentanglement

Nathan Michlo, Richard Klein, Steven James

[+] More

[-] Less

Learning disentangled representations with variational autoencoders (VAEs) is often attributed to the regularisation component of the loss. In this work, we highlight the interaction between data and the reconstruction term of the loss as the main contributor to disentanglement in VAEs. We show that standard benchmark datasets have unintended correlations between their subjective ground-truth factors and perceived axes in the data according to typical VAE reconstruction losses. Our work exploits this relationship to provide a theory for what constitutes an adversarial dataset under a given reconstruction loss. We verify this by constructing an example dataset that prevents disentanglement in state-of-the-art frameworks while maintaining human-intuitive ground-truth factors. Finally, we re-enable disentanglement by designing an example reconstruction loss that is once again able to perceive the ground-truth factors. Our findings demonstrate the subjective nature of disentanglement and the importance of considering the interaction between the ground-truth factors, data and notably, the reconstruction loss, which is under-recognised in the literature.

List of keywords

Machine Learning -> ML: Representation learning
Machine Learning -> ML: Autoencoders
Machine Learning -> ML: Unsupervised learning

1580

On Conditional and Compositional Language Model Differentiable Prompting

Jonathan Pilault, Can Liu, Mohit Bansal, Markus Dreyer

[+] More

[-] Less

Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks. Prompts can be represented by a human-engineered word sequence or by a learned continuous embedding. In this work, we investigate conditional and compositional differentiable prompting. We propose a new model, Prompt Production System (ProPS), which learns to transform task instructions or input metadata, into continuous prompts that elicit task-specific outputs from the PLM. Our model uses a modular network structure based on our neural formulation of Production Systems, which allows the model to learn discrete rules — neural functions that learn to specialize in transforming particular prompt input patterns, making it suitable for compositional transfer learning and few-shot learning. We present extensive empirical and theoretical analysis and show that ProPS consistently surpasses other PLM adaptation techniques, and often improves upon fully fine-tuned models, on compositional generalization tasks, controllable summarization and multilingual translation, while needing fewer trainable parameters.

List of keywords

Machine Learning -> ML: Multi-task and transfer learning
Machine Learning -> ML: Neuro-symbolic methods

1585

Domain-Adaptive Self-Supervised Face & Body Detection in Drawings

Barış Batuhan Topal, Deniz Yuret, Tevfik Metin Sezgin

[+] More

[-] Less

Drawings are powerful means of pictorial abstraction and communication. Understanding diverse forms of drawings, including digital arts, cartoons, and comics, has been a major problem of interest for the computer vision and computer graphics communities. Although there are large amounts of digitized drawings from comic books and cartoons, they contain vast stylistic variations, which necessitate expensive manual labeling for training domain-specific recognizers. In this work, we show how self-supervised learning, based on a teacher-student network with a modified student network update design, can be used to build face and body detectors. Our setup allows exploiting large amounts of unlabeled data from the target domain when labels are provided for only a small subset of it. We further demonstrate that style transfer can be incorporated into our learning pipeline to bootstrap detectors using a vast amount of out-of-domain labeled images from natural images (i.e., images from the real world). Our combined architecture yields detectors with state-of-the-art (SOTA) and near-SOTA performance using minimal annotation effort. Our code can be accessed from https://github.com/barisbatuhan/DASS_Detector.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Self-supervised Learning

1588

Accurate MRI Reconstruction via Multi-Domain Recurrent Networks

Jinbao Wei, Zhijie Wang, Kongqiao Wang, Li Guo, Xueyang Fu, Ji Liu, Xun Chen

[+] More

[-] Less

In recent years, deep convolutional neural networks (CNNs) have become dominant in MRI reconstruction from undersampled k-space. However, most existing CNNs methods reconstruct the undersampled images either in the spatial domain or in the frequency domain, and neglecting the correlation between these two domains. This hinders the further reconstruction performance improvement. To tackle this issue, in this work, we propose a new multi-domain recurrent network (MDR-Net) with multi-domain learning (MDL) blocks as its basic units to reconstruct the undersampled MR image progressively. Specifically, the MDL block interactively processes the local spatial features and the global frequency information to facilitate complementary learning, leading to fine-grained features generation. Furthermore, we introduce an effective frequency-based loss to narrow the frequency spectrum gap, compensating for over-smoothness caused by the widely used spatial reconstruction loss. Extensive experiments on public fastMRI datasets demonstrate that our MDR-Net consistently outperforms other competitive methods and is able to provide more details.

List of keywords

Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Applications

1590

A Fast Algorithm for Consistency Checking Partially Ordered Time

Leif Eriksson, Victor Lagerkvist

[+] More

[-] Less

Partially ordered models of time occur naturally in applications where agents/processes cannot perfectly communicate with each other, and can be traced back to the seminal work of Lamport. In this paper we consider the problem of deciding if a (likely incomplete) description of a system of events is consistent, the network consistency problem for the point algebra of partially ordered time (POT). While the classical complexity of this problem has been fully settled, comparably little is known of the fine-grained complexity of POT except that it can be solved in O*((0.368n)n) time by enumerating ordered partitions. We construct a much faster algorithm with a run-time bounded by O*((0.26n)n), which, e.g., is roughly 1000 times faster than the naive enumeration algorithm in a problem with 20 events. This is achieved by a sophisticated enumeration of structures similar to total orders, which are then greedily expanded toward a solution. While similar ideas have been explored earlier for related problems it turns out that the analysis for POT is non-trivial and requires significant new ideas.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning

1593

Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement

Hang Guo, Tao Dai, Guanghao Meng, Shu-Tao Xia

[+] More

[-] Less

Scene text image super-resolution (STISR), aiming to improve image quality while boosting scene text recognition accuracy, has recently achieved great success. However, most existing methods treat the foreground (character regions) and background (non-character regions) equally in the forward process, while neglecting the disturbance from the complex background, thus limiting the performance. To address this issue, in this paper, we propose a novel method LEMMA that explicitly models character regions to produce high-level text-specific guidance for super-resolution. To model the location of characters effectively, we propose the location enhancement module to extract character region features based on attention map sequence. Besides, we propose the multi-modal alignment module to perform bidirectional visual-semantic alignment to generate high-quality prior guidance, which is then incorporated into super-resolution branch to reconstruct high-quality recognizable scene text images. Experiments on TextZoom and four scene text recognition benchmarks demonstrate the superiority of our method over other state-of-the-art methods.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Applications
Computer Vision -> CV: Machine learning for vision

1596

RaSa: Relation and Sensitivity Aware Representation Learning for Text-based Person Search

Yang Bai, cao min, Daming Gao, Ziqiang Cao, Chen Chen, Zhenfeng Fan, Liqiang Nie, Min Zhang

[+] More

[-] Less

Text-based person search aims to retrieve the specified person images given a textual description. The key to tackling such a challenging task is to learn powerful multi-modal representations. Towards this, we propose a Relation and Sensitivity aware representation learning method (RaSa), including two novel tasks: Relation-Aware learning (RA) and Sensitivity-Aware learning (SA). For one thing, existing methods cluster representations of all positive pairs without distinction and overlook the noise problem caused by the weak positive pairs where the text and the paired image have noise correspondences, thus leading to overfitting learning. RA offsets the overfitting risk by introducing a novel positive relation detection task (i.e., learning to distinguish strong and weak positive pairs). For another thing, learning invariant representation under data augmentation (i.e., being insensitive to some transformations) is a general practice for improving representation’s robustness in existing methods. Beyond that, we encourage the representation to perceive the sensitive transformation by SA (i.e., learning to detect the replaced words), thus promoting the representation’s robustness. Experiments demonstrate that RaSa outperforms existing state-of-the-art methods by 6.94%, 4.45% and 15.35% in terms of Rank@1 on CUHK-PEDES, ICFG-PEDES and RSTPReid datasets, respectively. Our code will be released.

List of keywords

Computer Vision -> CV: Vision and language

1598

Improved Algorithms for Allen’s Interval Algebra by Dynamic Programming with Sublinear Partitioning

Leif Eriksson, Victor Lagerkvist

[+] More

[-] Less

Allen’s interval algebra is one of the most well-known calculi in qualitative temporal reasoning with numerous applications in artificial intelligence. Very recently, there has been a surge of improvements in the fine-grained complexity of NP-hard reasoning tasks in this algebra, which has improved the running time from the naive 2^O(n^2) to O*((1.0615n)^n), and even faster algorithms are known for unit intervals and the case when we a bounded number of overlapping intervals. Despite these improvements the best known lower bound is still only 2^o(n) under the exponential-time hypothesis and major improvements in either direction seemingly require fundamental advances in computational complexity. In this paper we propose a novel framework for solving NP-hard qualitative reasoning problems which we refer to as dynamic programming with sublinear partitioning. Using this technique we obtain a major improvement of O*((cn/log(n))^n) for Allen’s interval algebra. To demonstrate that the technique is applicable to further problem domains we apply it to a problem in qualitative spatial reasoning, the cardinal direction calculus, and solve it in O*((cn/log(n))^(2n/3)) time. Hence, not only do we significantly advance the state-of-the-art for NP-hard qualitative reasoning problems, but obtain a novel algorithmic technique that is likely applicable to many problems where 2^O(n) time algorithms are unlikely.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning

1603

Error in the Euclidean Preference Model

Luke Thorburn, Maria Polukarov, Carmine Ventre

[+] More

[-] Less

Spatial models of preference, in the form of vector embeddings, are learned by many deep learning and multiagent systems, including recommender systems. Often these models are assumed to approximate a Euclidean structure, where an individual prefers alternatives positioned closer to their "ideal point", as measured by the Euclidean metric. However, Bogomolnaia & Laslier (2007) showed that there exist ordinal preference profiles that cannot be represented with this structure if the Euclidean space has two fewer dimensions than there are individuals or alternatives. We extend this result, showing that there are realistic situations in which almost all preference profiles cannot be represented with the Euclidean model, and derive a theoretical lower bound on the expected error when using the Euclidean model to approximate non-Euclidean preference profiles. Our results have implications for the interpretation and use of vector embeddings, because in some cases close approximation of arbitrary, true ordinal relationships can be expected only if the dimensionality of the embeddings is a substantial fraction of the number of entities represented.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Machine Learning -> ML: Learning preferences or rankings

1604

A Fast Adaptive Randomized PCA Algorithm

Xu Feng, Wenjian Yu

[+] More

[-] Less

It is desirable to adaptively determine the number of dimensions (rank) for PCA according to a given tolerance of low-rank approximation error. In this work, we aim to develop a fast algorithm solving this adaptive PCA problem. We propose to replace the QR factorization in randQB_EI algorithm with matrix multiplication and inversion of small matrices, and propose a new error indicator to incrementally evaluate approximation error in Frobenius norm. Combining the shifted power iteration technique for better accuracy, we finally build up an algorithm named farPCA. Experimental results show that farPCA is much faster than the baseline methods (randQB_EI, randUBV and svds) in practical setting of multi-thread computing, while producing nearly optimal results of adpative PCA.

List of keywords

Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
Data Mining -> DM: Big data and scalability
Data Mining -> DM: Theoretical foundations of data mining

1621

Specifying and Testing k-Safety Properties for Machine-Learning Models

Maria Christakis, Hasan Ferit Eniser, Jörg Hoffmann, Adish Singla, Valentin Wüstholz

[+] More

[-] Less

Machine-learning models are becoming increasingly prevalent in our lives, for instance assisting in image-classification or decision-making tasks. Consequently, the reliability of these models is of critical importance and has resulted in the development of numerous approaches for validating and verifying their robustness and fairness. However, beyond such specific properties, it is challenging to specify, let alone check, general functional-correctness expectations from models. In this paper, we take inspiration from specifications used in formal methods, expressing functional-correctness properties by reasoning about k different executions—so-called k-safety properties. Considering a credit-screening model of a bank, the expected property that "if a person is denied a loan and their income decreases, they should still be denied the loan" is a 2-safety property. Here, we show the wide applicability of k-safety properties for machine-learning models and present the first specification language for expressing them. We also operationalize the language in a framework for automatically validating such properties using metamorphic testing. Our experiments show that our framework is effective in identifying property violations, and that detected bugs could be used to train better models.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Software engineering
Agent-based and Multi-agent Systems -> MAS: Engineering methods, platforms, languages and tools
AI Ethics, Trust, Fairness -> ETF: Safety and robustness

1624

Hierarchical Transformer for Scalable Graph Learning

Wenhao Zhu, Tianyu Wen, Guojie Song, Xiaojun Ma, Liang Wang

[+] More

[-] Less

Graph Transformer is gaining increasing attention in the field of machine learning and has demonstrated state-of-the-art performance on benchmarks for graph representation learning. However, as current implementations of Graph Transformer primarily focus on learning representations of small-scale graphs, the quadratic complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs. Additionally, conventional sampling-based methods fail to capture necessary high-level contextual information, resulting in a significant loss of performance. In this paper, we introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges. HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance. By utilizing graph hierarchies constructed through coarsening techniques, HSGT efficiently updates and stores multi-scale information in node embeddings at different levels. Together with sampling-based training methods, HSGT effectively captures and aggregates multi-level information on the hierarchical graph using only Transformer blocks. Empirical evaluations demonstrate that HSGT achieves state-of-the-art performance on large-scale benchmarks with graphs containing millions of nodes with high efficiency.

List of keywords

Machine Learning -> ML: Sequence and graph learning

1626

PowerBEV: A Powerful yet Lightweight Framework for Instance Prediction in Bird’s-Eye View

Peizheng Li, Shuxiao Ding, Xieyuanli Chen, Niklas Hanselmann, Marius Cordts, Jürgen Gall

[+] More

[-] Less

Accurately perceiving instances and predicting their future motion are key tasks for autonomous vehicles, enabling them to navigate safely in complex urban traffic. While bird’s-eye view (BEV) representations are commonplace in perception for autonomous driving, their potential in a motion prediction setting is less explored. Existing approaches for BEV instance prediction from surround cameras rely on a multi-task auto-regressive setup coupled with complex post-processing to predict future instances in a spatio-temporally consistent manner. In this paper, we depart from this paradigm and propose an efficient novel end-to-end framework named PowerBEV, which differs in several design choices aimed at reducing the inherent redundancy in previous methods. First, rather than predicting the future in an auto-regressive fashion, PowerBEV uses a parallel, multi-scale module built from lightweight 2D convolutional networks. Second, we show that segmentation and centripetal backward flow are sufficient for prediction, simplifying previous multi-task objectives by eliminating redundant output modalities. Building on this output representation, we propose a simple, flow warping-based post-processing approach which produces more stable instance associations across time. Through this lightweight yet powerful design, PowerBEV outperforms state-of-the-art baselines on the NuScenes Dataset and poses an alternative paradigm for BEV instance prediction. Code will be released upon publication.

List of keywords

Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Segmentation

1630

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie

[+] More

[-] Less

Streaming perception is a vital aspect of autonomous driving, yet previous research has lacked systematic examination. To address this, we propose the optimized framework, DAMO-StreamNet, which incorporates recent advances from the YOLO series and conducts a comprehensive analysis of spatial and temporal perception mechanisms to provide a state-of-the-art solution. The key innovations of DAMO-StreamNet include 1) utilization of a robust neck structure incorporating deformable convolution, which improves the receptive field and feature alignment abilities, 2) introduction of a dual-branch structure for extracting longer time-series information, resulting in improved prediction accuracy for motion states, 3) distillation at the logits level, which aligns the logits of the teacher and student models to the semantic space for more efficient optimization, and 4) real-time forecasting mechanism updates support frame features with the current frame before the next prediction in the inference phase to handle real-time streaming perception. Our experiments have shown that DAMO-StreamNet outperforms existing SOTA methods, achieving 37.8\% (normal size (600, 960)) and 43.3\% (large size (1200, 1920)) sAP without using any extra data. This work not only establishes a new benchmark for streaming perception but also provides valuable insights for future research. Moreover, DAMO-StreamNet can be applied to various types of autonomous systems, such as drones and robots, enabling real-time and accurate perception of the environment.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Recognition (object detection, categorization)

1633

Differentiable Economics for Randomized Affine Maximizer Auctions

Michael Curry, Tuomas Sandholm, John Dickerson

[+] More

[-] Less

A recent approach to automated mechanism design, differentiable economics, represents auctions by rich function approximators and optimizes their performance by gradient descent. The ideal auction architecture for differentiable economics would be perfectly strategyproof, support multiple bidders and items, and be rich enough to represent the optimal (i.e. revenue-maximizing) mechanism. So far, such an architecture does not exist. There are single-bidder approaches (MenuNet, RochetNet) which are always strategyproof and can represent optimal mechanisms. RegretNet is multi-bidder and can approximate any mechanism, but is only approximately strategyproof. We present an architecture that supports multiple bidders and is perfectly strategyproof, but cannot necessarily represent the optimal mechanism. This architecture is the classic affine maximizer auction (AMA), modified to offer lotteries. By using the gradient-based optimization tools of differentiable economics, we can now train lottery AMAs, competing with or outperforming prior approaches in revenue.

List of keywords

1635

A Bitwise GAC Algorithm For Alldifferent Constraints

Zhe Li, Yaohua Wang, Zhanshan Li

[+] More

[-] Less

The generalized arc consistency (GAC) algorithm is the prevailing solution for alldifferent constraint problems. The core part of GAC for alldifferent constraints is excavating and enumerating all the strongly connected components (SCCs) of the graph model. This causes a large amount of complex data structures to maintain the node information, leading to a large overhead both in time and memory space. More critically, the complexity of the data structures further precludes the coordination of different optimization schemes for GAC. To solve this problem, the key observation of this paper is that the GAC algorithm only cares whether a node of the graph model is in an SCC or not, rather than which SCCs it belongs to. Based on this observation, we propose AllDiffbit, which employs bitwise data structures and operations to efficiently determine if a node is in an SCC. This greatly reduces the corresponding overhead, and enhances the ability to incorporate existing optimizations to work in a synergistic way. Our experiments show that AllDiffbit outperforms the state-of-the-art GAC algorithms over 60%.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint programming
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction

1639

Bipolar Abstract Dialectical Frameworks Are Covered by Kleene’s Three-Valued Logic

Ringo Baumann, Maximilian Heinrich

[+] More

[-] Less

Abstract dialectical frameworks (ADFs) are one of the most powerful generalizations of classical Dung-style argumentation frameworks (AFs). The additional expressive power comes with an increase in computational complexity, namely one level up in the polynomial hierarchy in comparison to their AF counterparts. However, there is one important subclass, so-called bipolar ADFs (BADFs) which are as complex as classical AFs while offering strictly more modeling capacities. This property makes BADFs very attractive from a knowledge representation point of view and is the main reason why this class has received much attention recently. The semantics of ADFs rely on the Gamma-operator which takes as an input a three-valued interpretation and returns a new one. However, in order to obtain the output the original definition requires to consider any two-valued completion of a given three-valued interpretation. In this paper we formally prove that in case of BADFs we may bypass the computationally intensive procedure via applying Kleene’s three-valued logic K. We therefore introduce the so-called bipolar disjunctive normal form which is simply a disjunctive normal form where any used atom possesses either a positive or a negative polarity. We then show that: First, this normal form is expressive enough to represent any BADF and secondly, the computation can be done via Kleene’s K instead of dealing with two-valued completions. Inspired by the main correspondence result we present some first experiments showing the computational benefit of using Kleene.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning

1654

SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation

Shuyi Ouyang, Hongyi Wang, Shiao Xie, Ziwei Niu, Ruofeng Tong, Yen-Wei Chen, Lanfen Lin

[+] More

[-] Less

Referring image segmentation aims to segment an object out of an image via a specific language expression. The main concept is establishing global visual-linguistic relationships to locate the object and identify boundaries using details of the image. Recently, various Transformer-based techniques have been proposed to efficiently leverage long-range cross-modal dependencies, enhancing performance for referring segmentation. However, existing methods consider visual feature extraction and cross-modal fusion separately, resulting in insufficient visual-linguistic alignment in semantic space. In addition, they employ sequential structures and hence lack multi-scale information interaction. To address these limitations, we propose a Scale-Wise Language-Guided Vision Transformer (SLViT) with two appealing designs: (1) Language-Guided Multi-Scale Fusion Attention, a novel attention mechanism module for extracting rich local visual information and modeling global visual-linguistic relationships in an integrated manner. (2) An Uncertain Region Cross-Scale Enhancement module that can identify regions of high uncertainty using linguistic features and refine them via aggregated multi-scale features. We have evaluated our method on three benchmark datasets. The experimental results demonstrate that SLViT surpasses state-of-the-art methods with lower computational cost. The code is publicly available at: https://github.com/NaturalKnight/SLViT.

List of keywords

Computer Vision -> CV: Vision and language
Computer Vision -> CV: Segmentation

1666

Autonomous Exploration for Navigating in MDPs using Blackbox RL Algorithms

Pratik Gajane, Peter Auer, Ronald Ortner

[+] More

[-] Less

We consider the problem of navigating in a Markov decision process where extrinsic rewards are either absent or ignored. In this setting, the objective is to learn policies to reach all the states that are reachable within a given number of steps (in expectation) from a starting state. We introduce a novel meta-algorithm which can use any online reinforcement learning algorithm (with appropriate regret guarantees) as a black-box. Our algorithm demonstrates a method for transforming the output of online algorithms to a batch setting. We prove an upper bound on the sample complexity of our algorithm in terms of the regret bound of the used black box RL algorithm. Furthermore, we provide experimental results to validate the effectiveness of our algorithm and correctness of our theoretical results.

List of keywords

Machine Learning -> ML: Reinforcement learning

1679

Temporal Datalog with Existential Quantification

Matthias Lanzinger, Markus Nissl, Emanuel Sallinger, Przemysław Wałęga

[+] More

[-] Less

Existential rules, also known as tuple-generating dependencies (TGDs) or Datalog+/- rules, are heavily studied in the communities of Knowledge Representation and Reasoning, Semantic Web, and Databases, due to their rich modelling capabilities. In this paper we consider TGDs in the temporal setting, by introducing and studying DatalogMTLE—an extension of metric temporal Datalog (DatalogMTL) obtained by allowing for existential rules in programs. We show that DatalogMTLE is undecidable even in the restricted cases of guarded and weakly-acyclic programs. To address this issue we introduce uniform semantics which, on the one hand, is well-suited for modelling temporal knowledge as it prevents from unintended value invention and, on the other hand, provides decidability of reasoning; in particular, it becomes 2-EXPSPACE-complete for weakly-acyclic programs but remains undecidable for guarded programs. We provide an implementation for the decidable case and demonstrate its practical feasibility. Thus we obtain an expressive, yet decidable, rule-language and a system which is suitable for complex temporal reasoning with existential rules.

List of keywords

Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages

1685

Learning Heuristically-selected and Neurally-guided Feature for Age Group Recognition using Unconstrained Smartphone Interaction

Yingmao Miao, Qiwei Tian, Chenhao Lin, tianle song, Yajie Zhou, Junyi Zhao, Shuxin Gao, Chao Shen, minghui yang

[+] More

[-] Less

Owing to the boom of smartphone industries, the expansion of phone users has also been significant. Besides adults, children and elders have also begun to join the population of daily smartphone users. Such an expansion indeed facilitates the further exploration of the versatility and flexibility of digitization. However, these new users may also be susceptible to issues such as addiction, fraud, and insufficient accessibility. To fully utilize the capability of mobile devices without breaching personal privacy, we build the first corpus for age group recognition on smartphones with more than 1,445,087 unrestricted actions from 2,100 subjects. Then a series of heuristically-selected and neurally-guided features are proposed to increase the separability of the above dataset. Finally, we develop AgeCare, the first implicit and continuous system incorporated with bottom-to-top functionality without any restriction on user-phone interaction scenarios, for accurate age group recognition and age-tailored assistance on smartphones. Our system performs impressively well on this dataset and significantly surpasses the state-of-the-art methods.

List of keywords

Humans and AI -> HAI: Human-computer interaction
Humans and AI -> HAI: Personalization and user modeling
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction

1698

Unbiased Risk Estimator to Multi-Labeled Complementary Label Learning

Yi Gao, Miao Xu, Min-Ling Zhang

[+] More

[-] Less

Multi-label learning (MLL) usually requires assigning multiple relevant labels to each instance. While a fully supervised MLL dataset needs a large amount of labeling effort, using complementary labels can help alleviate this burden. However, current approaches to learning from complementary labels are mainly designed for multi-class learning and assume that each instance has a single relevant label. This means that these approaches cannot be easily applied to MLL when only complementary labels are provided, where the number of relevant labels is unknown and can vary across instances. In this paper, we first propose the unbiased risk estimator for the multi-labeled complementary label learning (MLCLL) problem. We also provide an estimation error bound to ensure the convergence of the empirical risk estimator. In some cases, the unbiased estimator may give unbounded gradients for certain loss functions and result in overfitting. To mitigate this problem, we improve the risk estimator by minimizing a proper loss function, which has been shown to improve gradient updates. Our experimental results demonstrate the effectiveness of the proposed approach on various datasets.

List of keywords

Machine Learning -> ML: Classification
Machine Learning -> ML: Multi-label
Machine Learning -> ML: Weakly supervised learning

1716

Exploring Leximin Principle for Fair Core-Selecting Combinatorial Auctions: Payment Rule Design and Implementation

Hao Cheng, Shufeng Kong, Yanchen Deng, Caihua Liu, Xiaohu Wu, Bo An, Chongjun Wang

[+] More

[-] Less

Core-selecting combinatorial auctions (CAs) restrict the auction result in the core such that no coalitions could improve their utilities by engaging in collusion. The minimum-revenue-core (MRC) rule is a widely used core-selecting payment rule to maximize the total utilities of all bidders. However, the MRC rule can suffer from severe unfairness since it ignores individuals’ utilities. To address this limitation, we propose to explore the leximin principle to achieve fairness in core-selecting CAs since the leximin principle prefers to maximize the utility of the worst-off; the resulting bidder-leximin-optimal (BLO) payment rule is then theoretically analyzed and an effective algorithm is further provided to compute the BLO outcome. Moreover, we conduct extensive experiments to show that our algorithm returns fairer utility distributions and is faster than existing algorithms of core-selecting payment rules.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Mechanism design
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Computational social choice

1728

Adaptive Reward Shifting Based on Behavior Proximity for Offline Reinforcement Learning

Zhe Zhang, Xiaoyang Tan

[+] More

[-] Less

One of the major challenges of the current offline reinforcement learning research is to deal with the distribution shift problem due to the change in state-action visitations for the new policy. To address this issue, we present a novel reward shifting-based method. Specifically, to regularize the behavior of the new policy at each state, we modify the reward to be received by the new policy by shifting it adaptively according to its proximity to the behavior policy, and apply the reward shifting along opposite directions for in-distribution actions and the ones not. In this way we are able to guide the learning procedure of the new policy itself by influencing the consequence of its actions explicitly, helping it to achieve a better balance between behavior constraints and policy improvement. Empirical results on the popular D4RL benchmarks show that the proposed method obtains competitive performance compared to the state-of-art baselines.

List of keywords

Machine Learning -> ML: Reinforcement learning

1736

A Diffusion Model with Contrastive Learning for ICU False Arrhythmia Alarm Reduction

Feng Wu, Guoshuai Zhao, Xueming Qian, Li-wei Lehman

[+] More

[-] Less

The false arrhythmia alarms in intensive care units significantly disturb patients and medical staffs and cause noise disturbances and slow staff response time, lead to lower medical service quality. In order to alleviate false alarming in ICU, previous works proposed rule-based methods and traditional machine learning methods. However, these methods are time-consuming and labor-intensive and difficult to deal with high-dimensional, sparse, unbalance and limited data. To address the above issues, we propose a reconstruction model based on the conditional denoising diffusion model. The model generates real arrhythmia signals with characteristics of candidate samples and uses the distance between the generated samples and the original samples to judge the alarm type. We design a network with residual links and self-attention mechanism to capture long-term dependencies existing in signal sequences， and leverage the contractive learning mechanism to maximize mutual information between true arrhythmia alarms and false arrhythmia alarms. We demonstrate the effectiveness of our approach on the mimic arrhythmia dataset for determining the alarm in ventricular tachycardia and ventricular fibrillation situations. The code will be released on the Github after the paper is accepted.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Health and medicine
Machine Learning -> ML: Applications
Machine Learning -> ML: Time series and data streams

1738

Generative Flow Networks for Precise Reward-Oriented Active Learning on Graphs

Yinchuan Li, Zhigang Li, Wenqian Li, Yunfeng Shao, Yan Zheng, Jianye Hao

[+] More

[-] Less

Many score-based active learning methods have been successfully applied to graph-structured data, aiming to reduce the number of labels and achieve better performance of graph neural networks based on predefined score functions. However, these algorithms struggle to learn policy distributions that are proportional to rewards and have limited exploration capabilities. In this paper, we innovatively formulate the graph active learning problem as a generative process, named GFlowGNN, which generates various samples through sequential actions with probabilities precisely proportional to a predefined reward function. Furthermore, we propose the concept of flow nodes and flow features to efficiently model graphs as flows based on generative flow networks, where the policy network is trained with specially designed rewards. Extensive experiments on real datasets show that the proposed approach has good exploration capability and transferability, outperforming various state-of-the-art methods.

List of keywords

Machine Learning -> ML: Sequence and graph learning

1748

Deep Partial Multi-Label Learning with Graph Disambiguation

Haobo Wang, Shisong Yang, Gengyu Lyu, Weiwei Liu, Tianlei Hu, Ke Chen, Songhe Feng, Gang Chen

[+] More

[-] Less

In partial multi-label learning (PML), each data example is equipped with a candidate label set, which consists of multiple ground-truth labels and other false-positive labels. Recently, graph-based methods, which demonstrate a good ability to estimate accurate confidence scores from candidate labels, have been prevalent to deal with PML problems. However, we observe that existing graph-based PML methods typically adopt linear multi-label classifiers and thus fail to achieve superior performance. In this work, we attempt to remove several obstacles for extending them to deep models and propose a novel deep Partial multi-Label model with grAph-disambIguatioN (PLAIN). Specifically, we introduce the instance-level and label-level similarities to recover label confidences as well as exploit label dependencies. At each training epoch, labels are propagated on the instance and label graphs to produce relatively accurate pseudo-labels; then, we train the deep model to fit the numerical labels. Moreover, we provide a careful analysis of the risk functions to guarantee the robustness of the proposed model. Extensive experiments on various synthetic datasets and three real-world PML datasets demonstrate that PLAIN achieves significantly superior results to state-of-the-art methods.

List of keywords

Machine Learning -> ML: Multi-label

1749

Learning Calibrated Uncertainties for Domain Shift: A Distributionally Robust Learning Approach

Haoxuan Wang, Zhiding Yu, Yisong Yue, Animashree Anandkumar, Anqi Liu, Junchi Yan

[+] More

[-] Less

We propose a framework for learning calibrated uncertainties under domain shifts, considering the case where the source (training) distribution differs from the target (test) distribution. We detect such domain shifts through the use of a differentiable density ratio estimator and train it together with the task network, composing an adjusted softmax predictive form that concerns the domain shift. In particular, the density ratio estimator yields a density ratio that reflects the closeness of a target (test) sample to the source (training) distribution. We employ it to adjust the uncertainty of prediction in the task network. This idea of using the density ratio is based on the distributionally robust learning (DRL) framework, which accounts for the domain shift through adversarial risk minimization. We demonstrate that our proposed method generates calibrated uncertainties that benefit many downstream tasks, such as unsupervised domain adaptation (UDA) and semi-supervised learning (SSL). On these tasks, methods like self-training and FixMatch use uncertainties to select confident pseudo-labels for re-training. Our experiments show that the introduction of DRL leads to significant improvements in cross-domain performance. We also demonstrate that the estimated density ratios show an agreement with the human selection frequencies, suggesting a positive correlation with a proxy of human perceived uncertainties.

List of keywords

Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Classification
Machine Learning -> ML: Multi-task and transfer learning

1756

Delegated Online Search

Pirmin Braun, Niklas Hahn, Martin Hoefer, Conrad Schecker

[+] More

[-] Less

In a delegation problem, a principal P with commitment power tries to pick one out of n options. Each option is drawn independently from a known distribution. Instead of inspecting the options herself, P delegates the information acquisition to a rational and self-interested agent A. After inspection, A proposes one of the options, and P can accept or reject. In this paper, we study a natural online variant of delegation, in which the agent searches through the options in an online fashion. How can we design algorithms for P that approximate the utility of her best option in hindsight? We show that P can obtain a \Theta(1/n)-approximation and provide more fine-grained bounds independent of n based on two parameters. If the ratio of maximum and minimum utility for A is bounded by a factor \alpha, we obtain an \Omega(\log\log \alpha / \log \alpha)-approximation algorithm and show that this is best possible. If P cannot distinguish options with the same value for herself, we show that ratios polynomial in 1/\alpha cannot be avoided. If the utilities of P and A for each option are related by a factor \beta, we obtain an \Omega(1 / \log \beta)-approximation, and O(\log \log \beta / \log \beta) is best possible.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Mechanism design
Agent-based and Multi-agent Systems -> MAS: Agent communication
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems

1758

Progressive Label Propagation for Semi-Supervised Multi-Dimensional Classification

Teng Huang, Bin-Bin Jia, Min-Ling Zhang

[+] More

[-] Less

In multi-dimensional classification (MDC), each training example is associated with multiple class variables from different class spaces. However, it is rather costly to collect labeled MDC examples which have to be annotated from several dimensions (class spaces). To reduce the labeling cost, we attempt to deal with the MDC problem under the semi-supervised learning setting. Accordingly, a novel MDC approach named PLAP is proposed to solve the resulting semi-supervised MDC problem. Overall, PLAP works under the label propagation framework to utilize unlabeled data. To further consider dependencies among class spaces, PLAP deals with each class space in a progressive manner, where the previous propagation results will be used to initialize the current propagation procedure and all processed class spaces and the current one will be regarded as an entirety. Experiments validate the effectiveness of the proposed approach.

List of keywords

Machine Learning -> ML: Classification
Machine Learning -> ML: Multi-label
Machine Learning -> ML: Semi-supervised learning

1762

Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training

Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzhi Li, Pheng-Ann Heng

[+] More

[-] Less

Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for both 2D and 3D computer vision. However, existing MAE-style methods can only learn from the data of a single modality, i.e., either images or point clouds, which neglect the implicit semantic and geometric correlation between 2D and 3D. In this paper, we explore how the 2D modality can benefit 3D masked autoencoding, and propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training. Joint-MAE randomly masks an input 3D point cloud and its projected 2D images, and then reconstructs the masked information of the two modalities. For better cross-modal interaction, we construct our JointMAE by two hierarchical 2D-3D embedding modules, a joint encoder, and a joint decoder with modal-shared and model-specific decoders. On top of this, we further introduce two cross-modal strategies to boost the 3D representation learning, which are local-aligned attention mechanisms for 2D-3D semantic cues, and a cross-reconstruction loss for 2D-3D geometric constraints. By our pre-training paradigm, Joint-MAE achieves superior performance on multiple downstream tasks, e.g., 92.4% accuracy for linear SVM on ModelNet40 and 86.07% accuracy on the hardest split of ScanObjectNN.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Representation learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

1763

A New Variable Ordering for In-processing Bounded Variable Elimination in SAT Solvers

Shuolin Li, Chu-Min Li, Jordi Coll, Mao Luo, Djamal Habet, Felip Manya

[+] More

[-] Less

Bounded Variable Elimination (BVE) is an important Boolean formula simplification technique in which the variable ordering is crucial. We define a new variable ordering based on variable activity, called ESA (variable Elimination Scheduled by Activity), for in-processing BVE in Conflict-Driven Clause Learning (CDCL) SAT solvers, and incorporate it in several state-of-the-art CDCL SAT solvers. Experimental results show that the new ESA ordering consistently makes these solvers solve more instances on the benchmark set including all instances used in the crafted, application and main tracks of all SAT Competitions up to 2022. The behaviour of ESA and the reason of its effectiveness are also analyzed.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Constraint Satisfaction and Optimization -> CSO: Solvers and tools

1778

Learning to Self-Reconfigure for Freeform Modular Robots via Altruism Proximal Policy Optimization

Lei Wu, Bin Guo, Qiuyun Zhang, zhuo sun, Jieyi Zhang, Zhiwen Yu

[+] More

[-] Less

The advantages of modular robot systems stem from their ability to change between different configurations, which enables them to adapt to complex and dynamic real-world environments. Then, how to perform the accurate and efficient change of the modular robot system, i.e., self-reconfiguration problem is essential. Existing reconfiguration algorithms are based on discrete motion primitives and are suitable for the lattice type modular robots. For the freeform modular robots, the modules are connected without alignment and the motion space is continuous. It makes the existing reconfiguration methods infeasible. In this work, for the freeform modular robots, we design a parallel distributed self-reconfiguration algorithm based on multi-agent reinforcement learning to realize the automatic design of conflict-free reconfiguration controllers in continuous action spaces. We introduce a collaborative mechanism into the reinforcement learning to avoid conflicts. Furthermore, we design the distributed termination criteria to achieve timely termination under the condition of local observability and limited communication. Simulations show that the efficiency and congruence are improved and the module movement show altruism in the proposed method, compared to the baselines.

List of keywords

Robotics -> ROB: Learning in robotics
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Robotics -> ROB: Multi-robot systems

1793

Proportionally Fair Online Allocation of Public Goods with Predictions

Siddhartha Banerjee, Safwan Hossain, Vasilis Gkatzelis, Billy Jin, Evi Micha, Nisarg Shah

[+] More

[-] Less

We design online algorithms for fair allocation of public goods to a set of $N$ agents over a sequence of $T$ rounds and focus on improving their performance using predictions. In the basic model, a public good arrives in each round, and every agent reveals their value for it upon arrival. The algorithm must irrevocably decide the investment in this good without exceeding a total budget of $B$ across all rounds. The algorithm can utilize (potentially noisy) predictions of each agent’s total value for all remaining goods. The algorithm’s performance is measured using a \emph{proportional fairness} objective, which informally demands that every group of agents be rewarded proportional to its size and the cohesiveness of its preferences. We show that no algorithm can achieve better than $\Theta(T/B)$ proportional fairness without predictions. With reasonably accurate predictions, the situation improves significantly, and $\Theta(\log (T/B))$ proportional fairness is achieved. We also extend our results to a general setting wherein a batch of $L$ public goods arrive in each round and $O(\log (\min(N,L) \cdot T/B))$ proportional fairness is achieved. Our exact bounds are parameterized as a function of the prediction error, with performance degrading gracefully with increasing errors.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Fair division

1796

Basket Representation Learning by Hypergraph Convolution on Repeated Items for Next-basket Recommendation

Yalin Yu, Enneng Yang, Guibing Guo, Linying Jiang, Xingwei Wang

[+] More

[-] Less

Basket representation plays an important role in the task of next-basket recommendation. However, existing methods generally adopts pooling operations to learn a basket’s representation, from which two critical issues can be identified. First, they treat a basket as a set of items independent and identically distributed. We find that items occurring in the same basket have much higher correlations than those randomly selected by conducting data analysis on a real dataset. Second, although some works have recognized the importance of items repeatedly purchased in multiple baskets, they ignore the correlations among the repeated items in a same basket, whose importance is shown by our data analysis. In this paper, we propose a novel Basket Representation Learning (BRL) model by leveraging the correlations among intra-basket items. Specifically, we first connect all the items (in a basket) as a hyperedge, where the correlations among different items can be well exploited by hypergraph convolution operations. Meanwhile, we also connect all the repeated items in the same basket as a hyperedge, whereby their correlations can be further strengthened. We generate a negative (positive) view of the basket by data augmentation on repeated (non-repeated) items, and apply contrastive learning to force more agreements on repeated items. Finally, experimental results on three real datasets show that our approach performs better than eight baselines in ranking accuracy.

List of keywords

Data Mining -> DM: Recommender systems
Data Mining -> DM: Information retrieval

1798

Multi-Modality Deep Network for JPEG Artifacts Reduction

Xuhao Jiang, Weimin Tan, Qing Lin, Chenxi Ma, Bo Yan, Liquan Shen

[+] More

[-] Less

In recent years, many convolutional neural network-based models are designed for JPEG artifacts reduction, and have achieved notable progress. However, few methods are suitable for extreme low-bitrate image compression artifacts reduction. The main challenge is that the highly compressed image loses too much information, resulting in reconstructing high-quality image difficultly. To address this issue, we propose a multimodal fusion learning method for text-guided JPEG artifacts reduction, in which the corresponding text description not only provides the potential prior information of the highly compressed image, but also serves as supplementary information to assist in image deblocking. We fuse image features and text semantic features from the global and local perspectives respectively, and design a contrastive loss built upon contrastive learning to produce visually pleasing results. Extensive experiments, including a user study, prove that our method can obtain better deblocking results compared to the state-of-the-art methods.

List of keywords

Machine Learning -> ML: Multi-modal learning
Computer Vision -> CV: Machine learning for vision

1805

Totally Dynamic Hypergraph Neural Networks

Peng Zhou, Zongqian Wu, Xiangxiang Zeng, Guoqiu Wen, Junbo Ma, Xiaofeng Zhu

[+] More

[-] Less

As an extension of graphs, hypergraphs can naturally represent multi-relationships and have great application prospects in real life. The static hypergraph neural network relies too much on the initialized hypergraph structure and cannot mine hidden relationships within the data; the dynamic hypergraph neural network optimizes the hypergraph structure in the process of model iteration and can mine more information. However, the existing dynamic hypergraph neural networks ignore the features of hyperedges and cannot adjust the number of hyperedges, which proposes limitations when adjusting hypergraphs. We propose a novel hypergraph neural network that can adjust the number of hyperedges while optimizing the hypergraph structure. Our method focuses on hyperedge features and learns their feature distribution rather than fixed hyperedge features. The hyperedge is obtained by sampling from the learned distribution, and then the hypergraph is constructed according to the attention coefficient of sampled hyperedges and nodes, and finally, the node features are updated using the hypergraph convolution algorithm. Experimental results demonstrate the effectiveness of our method.

List of keywords

Data Mining -> DM: Mining graphs
Data Mining -> DM: Networks

1809

Expanding the Hyperbolic Kernels: A Curvature-aware Isometric Embedding View

Meimei Yang, Pengfei Fang, Hui Xue

[+] More

[-] Less

Modeling data relation as a hierarchical structure has proven beneficial for many learning scenarios, and the hyperbolic space, with negative curvature, can encode such data hierarchy without distortion. Several recent studies also show that the representation power of the hyperbolic space can be further improved by endowing the kernel methods. Unfortunately, the known kernel methods, developed in hyperbolic space, are limited by the adaptation capacity or distortion issues. This paper addresses the issues through a novel embedding function. To this end, we propose a curvature-aware isometric embedding, which establishes an isometry from the Poincar\’e model to a special reproducing kernel Hilbert space (RKHS). Then we can further define a series of kernels on this RKHS, including several positive definite kernels and an indefinite kernel. Thorough experiments are conducted to demonstrate the superiority of our proposals over existing-known hyperbolic and Euclidean kernels in various learning tasks, e.g., zero-shot learning and graph learning.

List of keywords

Machine Learning -> ML: Kernel methods
Machine Learning -> ML: Geometric learning

1812

Quantifying Harm

Sander Beckers, Hana Chockler, Joseph Halpern

[+] More

[-] Less

In [Beckers et. al. 2022] a qualitative notion of harm is defined: either harm is caused, or it is not. For practical applications, we often need to quantify harm; for example, we may want to choose the least harmful of a set of possible interventions. We first present a quantitative definition of harm in a deterministic context involving a single individual, then we consider the issues involved in dealing with uncertainty regarding the context and going from a notion of harm for a single individual to a notion of “societal harm”, which involves aggregating the harm to individuals. We show that the “obvious” way of doing this (just taking the expected harm for an individual and then summing the expected harm over all individuals) can lead to counterintuitive or inappropriate answers, and discuss alternatives, drawing on work from the decision-theory literature.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Ethical, legal and societal issues
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
Uncertainty in AI -> UAI: Decision and utility theory

1813

A Dual Semantic-Aware Recurrent Global-Adaptive Network for Vision-and-Language Navigation

Liuyi Wang, Zongtao He, Jiagui Tang, Ronghao Dang, Naijia Wang, Chengju Liu, Qijun Chen

[+] More

[-] Less

Vision-and-Language Navigation (VLN) is a realistic but challenging task that requires an agent to locate the target region using verbal and visual cues. While significant advancements have been achieved recently, there are still two broad limitations: (1) The explicit information mining for significant guiding semantics concealed in both vision and language is still under-explored; (2) The previously structured map method provides the average historical appearance of visited nodes, while it ignores distinctive contributions of various images and potent information retention in the reasoning process. This work proposes a dual semantic-aware recurrent global-adaptive network (DSRG) to address the above problems. First, DSRG proposes an instruction-guidance linguistic module (IGL) and an appearance-semantics visual module (ASV) for boosting vision and language semantic learning respectively. For the memory mechanism, a global adaptive aggregation module (GAA) is devised for explicit panoramic observation fusion, and a recurrent memory fusion module (RMF) is introduced to supply implicit temporal hidden states. Extensive experimental results on the R2R and REVERIE datasets demonstrate that our method achieves better performance than existing methods. Code is available at https://github.com/CrystalSixone/DSRG.

List of keywords

Computer Vision -> CV: Vision and language
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Structural and model-based approaches, knowledge representation and reasoning

1816

Cardinality-Minimal Explanations for Monotonic Neural Networks

Ouns El Harzli, Bernardo Cuenca Grau, Ian Horrocks

[+] More

[-] Less

In recent years, there has been increasing interest in explanation methods for neural model predictions that offer precise formal guarantees. These include abductive (respectively, contrastive) methods, which aim to compute minimal subsets of input features that are sufficient for a given prediction to hold (respectively, to change a given prediction). The corresponding decision problems are, however, known to be intractable. In this paper, we investigate whether tractability can be regained by focusing on neural models implementing a monotonic function. Although the relevant decision problems remain intractable, we can show that they become solvable in polynomial time by means of greedy algorithms if we additionally assume that the activation functions are continuous everywhere and differentiable almost everywhere. Our experiments suggest favourable performance of our algorithms.

List of keywords

Machine Learning -> ML: Explainable/Interpretable machine learning
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Theory of deep learning

1817

Explainable Multi-Agent Reinforcement Learning for Temporal Queries

Kayla Boggess, Sarit Kraus, Lu Feng

[+] More

[-] Less

As multi-agent reinforcement learning (MARL) systems are increasingly deployed throughout society, it is imperative yet challenging for users to understand the emergent behaviors of MARL agents in complex environments. This work presents an approach for generating policy-level contrastive explanations for MARL to answer a temporal user query, which specifies a sequence of tasks completed by agents with possible cooperation. The proposed approach encodes the temporal query as a PCTL* logic formula and checks if the query is feasible under a given MARL policy via probabilistic model checking. Such explanations can help reconcile discrepancies between the actual and anticipated multi-agent behaviors. The proposed approach also generates correct and complete explanations to pinpoint reasons that make a user query infeasible. We have successfully applied the proposed approach to four benchmark MARL domains (up to 9 agents in one domain). Moreover, the results of a user study show that the generated explanations significantly improve user performance and satisfaction.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Human-agent interaction

1826

Levin Tree Search with Context Models

Laurent Orseau, Levi Lelis, Marcus Hutter

[+] More

[-] Less

Levin Tree Search (LTS) is a search algorithm that makes use of a policy (a probability distribution over actions) and comes with a theoretical guarantee on the number of expansions before reaching a goal node, depending on the quality of the policy. This guarantee can be used as a loss function, which we call the LTS loss, to optimize neural networks representing the policy (LTS+NN). In this work we show that the neural network can be substituted with parameterized context models originating from the online compression literature (LTS+CM). We show that the LTS loss is convex under this new model, which allows for using standard convex optimization tools, and obtain convergence guarantees to the optimal parameters in an online setting for a given set of solution trajectories — guarantees that cannot be provided for neural networks. The new LTS+CM algorithm compares favorably against LTS+NN on several benchmarks: Sokoban (Boxoban), The Witness, and the 24-Sliding Tile puzzle (STP). The difference is particularly large on STP, where LTS+NN fails to solve most of the test instances while LTS+CM solves each test instance in a fraction of a second. Furthermore, we show that LTS+CM is able to learn a policy that solves the Rubik’s cube in only a few hundred expansions, which considerably improves upon previous machine learning techniques.

List of keywords

Search -> S: Search and machine learning
Search -> S: Heuristic search

1829

Multi-Task Learning via Time-Aware Neural ODE

Feiyang YE, Xuehao Wang, Yu Zhang, Ivor Tsang

[+] More

[-] Less

Multi-Task Learning (MTL) is a well-established paradigm for learning shared models for a diverse set of tasks. Moreover, MTL improves data efficiency by jointly training all tasks simultaneously. However, directly optimizing the losses of all the tasks may lead to imbalanced performance on all the tasks due to the competition among tasks for the shared parameters in MTL models. Many MTL methods try to mitigate this problem by dynamically weighting task losses or manipulating task gradients. Different from existing studies, in this paper, we propose a Neural Ordinal diffeRential equation based Multi-tAsk Learning (NORMAL) method to alleviate this issue by modeling task-specific feature transformations from the perspective of dynamic flows built on the Neural Ordinary Differential Equation (NODE). Specifically, the proposed NORMAL model designs a time-aware neural ODE block to learn task-specific time information, which determines task positions of feature transformations in the dynamic flow, in NODE automatically via gradient descent methods. In this way, the proposed NORMAL model handles the problem of competing shared parameters by learning task positions. Moreover, the learned task positions can be used to evaluate the relevance of different tasks. Extensive experiments show that the proposed NORMAL model outperforms state-of-the-art MTL models.

List of keywords

Machine Learning -> ML: Multi-task and transfer learning

1830

New Bounds and Constraint Programming Models for the Weighted Vertex Coloring Problem

Olivier Goudet, Cyril Grelier, David Lesaint

[+] More

[-] Less

This paper addresses the weighted vertex coloring problem (WVCP) which is an NP-hard variant of the graph coloring problem with various applications. Given a vertex-weighted graph, the problem consists of partitioning vertices in independent sets (colors) so as to minimize the sum of the maximum weights of the colors. We first present an iterative procedure to reduce the size of WVCP instances and prove new upper bounds on the objective value and the number of colors needed to construct optimal solutions. Alternative constraint programming models are then introduced which rely on primal and dual encodings of the problem and use symmetry-breaking constraints. A large number of experiments are conducted on benchmark instances. We analyze the impact of using specific bounds to reduce the search space and speed up the exact resolution of instances. New optimality proofs are reported for some benchmark instances.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint programming
Search -> S: Combinatorial search and optimisation

1831

Norm Deviation in Multiagent Systems: A Foundation for Responsible

Amika Singh, Munindar Singh

[+] More

[-] Less

The power of norms in both human societies and sociotechnical systems arises from the facts that (1) norms characterize acceptable behavior in high-level terms and (2) they are not hard controls and can be deviated from. Thus, the design of autonomous agents faces an essential tension: these agents must both (1) respect applicable societal norms, including laws and policies, and (2) deviate from those norms when blindly following them may lead to diminished outcomes. We propose a conceptual foundation for norm deviation. As a guiding framework, we adopt Habermas’s theory of communicative action comprising objective, subjective, and practical validity claims regarding the suitability of such deviation. Our analysis thus goes beyond previous studies of norm deviation and yields principles uniting norms and values by which to develop effective autonomous agents.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Normative systems

1833

Probabilistic Masked Attention Networks for Explainable Sequential Recommendation

Huiyuan Chen, Kaixiong Zhou, Zhimeng Jiang, Michael Yeh, Xiaoting Li, Menghai Pan, Yan Zheng, Xia Hu, Hao Yang

[+] More

[-] Less

The recently proposed Transformer-based models are highly powerful for modeling temporal dynamics of user preference in sequential recommendation. Most of variants adopt the Softmax transformation in the self-attention layers to generate dense attention probabilities. However, real-world item sequences are often noisy, containing a mixture of true-positive and false-positive interactions. Such dense attentions inevitably assign probability mass to noisy or irrelevant items, leading to sub-optimal performance and poor explainability. To tackle these issues, we propose a Probabilistic Masked Attention Network (PMAN) to identify the sparse pattern of attentions, which is more desirable for pruning noisy items in sequential recommendation. Specifically, we employ a probabilistic mask to achieve sparse attentions under a constrained optimization framework. As such, PMAN allows to select which information is critical to be retained or dropped in a data-driven fashion. Experimental studies on real-world benchmark datasets show that PMAN is able to improve the performance of Transformers significantly, and the performance gain becomes larger for more noisy sequences. Our code and data are available in: \href{https://anonymous.4open.science/r/PMAN_Rec-E72E}{{https://anonymous.4open.science/r/PMAN\_Rec-E72E}}.

List of keywords

Data Mining -> DM: Collaborative filtering
Data Mining -> DM: Information retrieval

1834

A Unifying Formal Approach to Importance Values in Boolean Functions

Hans Harder, Clemens Dubslaff, Christel Baier, Simon Jantsch

[+] More

[-] Less

Boolean functions and their representation through logics, circuits, AI classifiers, or binary decision diagrams (BDDs) play a central role in the design and analysis of computing systems. Quantifying the relative impact of variables on the truth value by means of importance values can provide useful insights to steer system design and debugging. In this paper, we introduce a uniform framework for reasoning about importance values by a generic notion of importance value functions (IVFs). IVFs are identified by a set of axioms that are motivated from several notions of importance values introduced in the literature, including Ben-Or and Linial’s influence and Chockler, Halpern, and Kupferman’s notion of responsibility and blame. We establish a connection of IVFs to game-theoretic concepts such as Shapley and Banzhaf values that measure the impact of players on outcomes in coalition games. Exploiting BDD-based symbolic methods and projected model counting, we devise and evaluate practical computation schemes for IVFs.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Cooperative games
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty

1836

Learning constraint networks over unknown constraint languages

Christian Bessiere, Clement Carbonnel, Areski Himeur

[+] More

[-] Less

Constraint acquisition is the task of learning a constraint network from examples of solutions and non-solutions. Existing constraint acquisition systems typically require advance knowledge of the target network’s constraint language, which significantly narrows their scope of applicability. In this paper we propose a constraint acquisition method that computes a suitable constraint language as part of the learning process, eliminating the need for any advance knowledge. We report preliminary experiments on various acquisition benchmarks.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint learning and acquisition
Constraint Satisfaction and Optimization -> CSO: Constraint programming

1838

On Translations between ML Models for XAI Purposes

Alexis de Colnet, Pierre Marquis

[+] More

[-] Less

In this paper, the succinctness of various ML models is studied. To be more precise, the existence of polynomial-time and polynomial-space translations between representation languages for classifiers is investigated. The languages that are considered include decision trees, random forests, several types of boosted trees, binary neural networks, Boolean multilayer perceptrons, and various logical representations of binary classifiers. We provide a complete map indicating for every pair of languages C, C’ whether or not a polynomial-time / polynomial-space translation exists from C to C’. We also explain how to take advantage of the resulting map for XAI purposes.

List of keywords

Knowledge Representation and Reasoning -> KRR: Knowledge compilation
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages

1842

Game Theory with Simulation of Other Players

Vojtech Kovarik, Caspar Oesterheld, Vincent Conitzer

[+] More

[-] Less

Game-theoretic interactions with AI agents could differ from traditional human-human interactions in various ways. One such difference is that it may be possible to simulate an AI agent (for example because its source code is known), which allows others to accurately predict the agent’s actions. This could lower the bar for trust and cooperation. In this paper, we first formally define games in which one player can simulate another at a cost, and derive some basic properties of such games. Then, we prove a number of results for such games, including: (1) introducing simulation into generic-payoff normal-form games makes them easier to solve; (2) if the only obstacle to cooperation is a lack of trust in the possibly-simulated agent, simulation enables equilibria that improve the outcome for both agents; and (3) however, there are settings where introducing simulation results in strictly worse outcomes for both players.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games

1846

Gapformer: Graph Transformer with Graph Pooling for Node Classification

Chuang Liu, Yibing Zhan, Xueqi Ma, Liang Ding, Dapeng Tao, Jia Wu, Wenbin Hu

[+] More

[-] Less

Graph Transformers (GTs) have proved their advantage in graph-level tasks. However, existing GTs still perform unsatisfactorily on the node classification task due to 1) the overwhelming unrelated information obtained from a vast number of irrelevant distant nodes and 2) the quadratic complexity regarding the number of nodes via the fully connected attention mechanism. In this paper, we present Gapformer, a method for node classification that deeply incorporates Graph Transformer with Graph Pooling. More specifically, Gapformer coarsens the large-scale nodes of a graph into a smaller number of pooling nodes via local or global graph pooling methods, and then computes the attention solely with the pooling nodes rather than all other nodes. In such a manner, the negative influence of the overwhelming unrelated nodes is mitigated while maintaining the long-range information, and the quadratic complexity is reduced to linear complexity with respect to the fixed number of pooling nodes. Extensive experiments on 13 node classification datasets, including homophilic and heterophilic graph datasets, demonstrate the competitive performance of Gapformer over existing Graph Neural Networks and GTs.

List of keywords

Data Mining -> DM: Mining graphs
Data Mining -> DM: Networks

1850

MultiPar-T: Multiparty-Transformer for Capturing Contingent Behaviors in Group Conversations

Dong Won Lee, Yubin Kim, Rosalind Picard, Cynthia Breazeal, Hae Won Park

[+] More

[-] Less

As we move closer to real-world AI systems, AI agents must be able to deal with multiparty (group) conversations. Recognizing and interpreting multiparty behaviors is challenging, as the system must recognize individual behavioral cues, deal with the complexity of multiple streams of data from multiple people, and recognize the subtle contingent social exchanges that take place amongst group members. To tackle this challenge, we propose the Multiparty-Transformer (Multipar-T), a transformer model for multiparty behavior modeling. The core component of our proposed approach is the Crossperson Attention, which is specifically designed to detect contingent behavior between pairs of people. We verify the effectiveness of Multipar-T on a publicly available video-based group engagement detection benchmark, where it outperforms state-of-the-art approaches in average F-1 scores by 5.2% and individual class F-1 scores by up to 10.0%. Through qualitative analysis, we show that our Crossperson Attention module is able to discover contingent behavior.

List of keywords

Machine Learning -> ML: Attention models
Computer Vision -> CV: Video analysis and understanding
Humans and AI -> HAI: Computer-aided education

1856

Towards a Better Understanding of Learning with Multiagent Teams

David Radke, Kyle Tilbury, Kate Larson, Tim Brecht

[+] More

[-] Less

While it has long been recognized that a team of individual learning agents can be greater than the sum of its parts, recent work has shown that larger teams are not necessarily more effective than smaller ones. In this paper, we study why and under which conditions certain team structures promote effective learning for a population of individual learning agents. We show that, depending on the environment, some team structures help agents learn to specialize into specific roles, resulting in more favorable global results. However, large teams create credit assignment challenges that reduce coordination, leading to large teams performing poorly compared to smaller ones. We support our conclusions with both theoretical analysis and empirical results.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation

1868

Anticipatory Fictitious Play

Alex Cloud

[+] More

[-] Less

Fictitious play is an algorithm for computing Nash equilibria of matrix games. Recently, machine learning variants of fictitious play have been successfully applied to complicated real-world games. This paper presents a simple modification of fictitious play which is a strict improvement over the original: it has the same theoretical worst-case convergence rate, is equally applicable in a machine learning context, and enjoys superior empirical performance. We conduct an extensive comparison of our algorithm with fictitious play, proving an optimal $O(t^{-1})$ convergence rate for certain classes of games, demonstrating superior performance numerically across a variety of games, and concluding with experiments that extend these algorithms to the setting of deep multiagent reinforcement learning.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Machine Learning -> ML: Reinforcement learning

1875

Advancing Post-Hoc Case-Based Explanation with Feature Highlighting

Eoin Kenny, Eoin Delaney, Mark T. Keane

[+] More

[-] Less

Explainable AI (XAI) has been proposed as a valu- able tool to assist in downstream tasks involving human-AI collaboration. Perhaps the most psy- chologically valid XAI techniques are case-based approaches which display “whole” exemplars to explain the predictions of black-box AI systems. However, for such post-hoc XAI methods dealing with images, there has been no attempt to improve their scope by using multiple clear feature “parts” of the images to explain the predictions while link- ing back to relevant cases in the training data, thus allowing for more comprehensive explanations that are faithful to the underlying model. In this work, we address this gap by proposing two general al- gorithms (latent and superpixel-based) which can isolate multiple clear feature “parts” in a test im- age, and then connect them to the explanatory cases found in the training data, before testing their effec- tiveness in a carefully designed user study. Results demonstrate that the proposed algorithms appropri- ately calibrate a user’s feelings of correctness for ambiguous classifications in real world data on the ImageNet dataset.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Knowledge Representation and Reasoning -> KRR: Case-based reasoning

1883

On the Paradox of Learning to Reason from Data

Honghua Zhang, Liunian Harold Li, Tao Meng, Kai-Wei Chang, Guy Van den Broeck

[+] More

[-] Less

Logical reasoning is needed in a wide range of NLP tasks. Can a BERT model be trained end-to-end to solve logical reasoning problems presented in natural language? We attempt to answer this question in a confined problem space where there exists a set of parameters that perfectly simulates logical reasoning. We make observations that seem to contradict each other: BERT attains near-perfect accuracy on in-distribution test examples while failing to generalize to other data distributions over the exact same problem space. Our study provides an explanation for this paradox: instead of learning to emulate the correct reasoning function, BERT has, in fact, learned statistical features that inherently exist in logical reasoning problems. We also show that it is infeasible to jointly remove statistical features from data, illustrating the difficulty of learning to reason in general. Our result naturally extends to other neural models (e.g. T5) and unveils the fundamental difference between learning to reason and learning to achieve high performance on NLP benchmarks using statistical features.

List of keywords

Knowledge Representation and Reasoning -> KRR: Learning and reasoning
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP

1884

Improve Video Representation with Temporal Adversarial Augmentation

Jinhao Duan, Quanfu Fan, Hao Cheng, Xiaoshuang Shi, Kaidi Xu

[+] More

[-] Less

Recent works reveal that adversarial augmentation benefits the generalization of neural networks (NNs) if used in an appropriate manner. In this paper, we introduce Temporal Adversarial Augmentation (TA), a novel video augmentation technique that utilizes temporal attention. Unlike conventional adversarial augmentation, TA is specifically designed to shift the attention distributions of neural networks with respect to video clips by maximizing a temporal-related loss function. We demonstrate that videos augmented by TA will obtain diverse temporal views, which significantly impact the focus of neural networks. Training with these examples remedies the flaw of unbalanced temporal information perception and enhances the ability to defend against temporal shifts, ultimately leading to better generalization. To leverage TA, we propose Temporal Video Adversarial Fine-tuning (TAF) framework for improving video representations. TAF is a model-agnostic, generic, and interpretability-friendly training strategy. We evaluate TAF with four powerful models (TSM, GST, TAM, and TPN) over three challenging temporal-related benchmarks (Something-something V1&V2 and diving48). Experimental results demonstrate that TAF effectively improves the test accuracy of these models with notable margins, e.g., training TSM with TAF achieves consistent improvements on Something-something V1(+1.3%) and V2(+0.9%), without introducing additional parameters or computational costs. As a byproduct, TAF also improves the robustness under out-of-distribution (OOD) settings.

List of keywords

Computer Vision -> CV: Video analysis and understanding
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Representation learning

1889

One Model, Any CSP: Graph Neural Networks as Fast Global Search Heuristics for Constraint Satisfaction

Jan Tönshoff, Berke Kisin, Jakob Lindner, Martin Grohe

[+] More

[-] Less

We propose a universal Graph Neural Network architecture which can be trained as an end-2-end search heuristic for any Constraint Satisfaction Problem (CSP). Our architecture can be trained unsupervised with policy gradient descent to generate problem specific heuristics for any CSP in a purely data driven manner. The approach is based on a novel graph representation for CSPs that is both generic and compact and enables us to process every possible CSP instance with one GNN, regardless of constraint arity, relations or domain size. Unlike previous RL-based methods, we operate on a global search action space and allow our GNN to modify any number of variables in every step of the stochastic search. This enables our method to properly leverage the inherent parallelism of GNNs. We perform a thorough empirical evaluation where we learn heuristics for well known and important CSPs, both decision and optimisation problems, from random data, including graph coloring, MAXCUT, and MAX-k-SAT, and the general RB model. Our approach significantly outperforms prior end-2-end approaches for neural combinatorial optimization. It can compete with conventional heuristics and solvers on test instances that are several orders of magnitude larger and structurally more complex than those seen during training.

List of keywords

Machine Learning -> ML: Reinforcement learning
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Machine Learning -> ML: Sequence and graph learning

1918

Contrastive Learning for Sign Language Recognition and Translation

Shiwei Gan, Yafeng Yin, Zhiwei Jiang, Kang Xia, Lei Xie, Sanglu Lu

[+] More

[-] Less

There are two problems that widely exist in current end-to-end sign language processing architecture. One is the CTC spike phenomenon which weakens the visual representational ability in Continuous Sign Language Recognition (CSLR). The other one is the exposure bias problem which leads to the accumulation of translation errors during inference in Sign Language Translation (SLT). In this paper, we tackle these issues by introducing contrast learning, aiming to enhance both visual-level feature representation and semantic-level error tolerance. Specifically, to alleviate CTC spike phenomenon and enhance visual-level representation, we design a visual contrastive loss by minimizing visual feature distance between different augmented samples of frames in one sign video, so that the model can further explore features by utilizing numerous unlabeled frames in an unsupervised way. To alleviate exposure bias problem and improve semantic-level error tolerance, we design a semantic contrastive loss by re-inputting the predicted sentence into semantic module and comparing features of ground-truth sequence and predicted sequence, for exposing model to its own mistakes. Besides, we propose two new metrics, i.e., Blank Rate and Consecutive Wrong Word Rate to directly reflect our improvement on the two problems. Extensive experimental results on current sign language datasets demonstrate the effectiveness of our approach, which achieves state-of-the-art performance.

List of keywords

Computer Vision -> CV: Vision and language
Computer Vision -> CV: Action and behavior recognition

1920

Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue

Cristian-Paul BARA, Ziqiao Ma, Yingzhuo Yu, Julie Shah, Joyce Chai

[+] More

[-] Less

Collaborative tasks often begin with partial task knowledge and incomplete plans from each partner. To complete these tasks, partners need to engage in situated communication with their partners and coordinate their partial plans towards a complete plan to achieve a joint task goal. While such collaboration seems effortless in a human-human team, it is highly challenging for human-AI collaboration. To address this limitation, this paper takes a step towards Collaborative Plan Acquisition, where humans and agents strive to learn and communicate with each other to acquire a complete plan for joint tasks. Specifically, we formulate a novel problem for agents to predict the missing task knowledge for themselves and for their partners based on rich perceptual and dialogue history. We extend a situated dialogue benchmark for symmetric collaborative tasks in a 3D blocks world and investigate computational strategies for plan acquisition. Our empirical results suggest that predicting the partner’s missing knowledge is a more viable approach than predicting one’s own. We show that explicit modeling of the partner’s dialogue moves and mental states produces improved and more stable results than without. These results provide insight for future AI agents that can predict what knowledge their partner is missing and, therefore, can proactively communicate such information to help the partner acquire such missing knowledge toward a common understanding of joint tasks.

List of keywords

Humans and AI -> HAI: Human-AI collaboration
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning

1927

CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization

Xiaohan Yu, Jun Wang, Yongsheng Gao

[+] More

[-] Less

Ultra-fine-grained visual classification (ultra-FGVC) targets at classifying sub-grained categories of fine-grained objects. This inevitably requires discriminative representation learning within a limited training set. Exploring intrinsic features from the object itself, e.g., predicting the rotation of a given image, has demonstrated great progress towards learning discriminative representation. Yet none of these works consider explicit supervision for learning mutual information at instance level. To this end, this paper introduces CLE-ViT, a novel contrastive learning encoded transformer, to address the fundamental problem in ultra-FGVC. The core design is a self-supervised module that performs self-shuffling and masking and then distinguishes these altered images from other images. This drives the model to learn an optimized feature space that has a large inter-class distance while remaining tolerant to intra-class variations. By incorporating this self-supervised module, the network acquires more knowledge from the intrinsic structure of the input data, which improves the generalization ability without requiring extra manual annotations. CLE-ViT demonstrates strong performance on 7 publicly available datasets, demonstrating its effectiveness in the ultra-FGVC task. The code is available at https://github.com/Markin-Wang/CLEViT.

List of keywords

Machine Learning -> ML: Classification
Machine Learning -> ML: Applications

1932

Incorporating Unlikely Negative Cues for Distinctive Image Captioning

Zhengcong Fei, Junshi Huang

[+] More

[-] Less

Recent neural image captioning models have achieved promising results on some automatic metrics, yet suffer badly from the generic sentence problem, limiting their applications to a few toy scenarios. An interesting approach, namely negative training, has been proposed to remind the model not to generate a high-frequency while meaningless sentence. However, its usability in image captioning is hindered by one issue, only considering frequency perspective will ignore the low-frequency but generic and vague sentences, especially facing diversified visual scenes. In this paper, we propose to incorporate unlikely \emph{negative} knowledge into image captioning, to keep the model away from undesirable generic descriptions while avoiding the above problems. Specifically, we first train a negative teacher model that can produce image-wise generic sentences with retrieval entropy-filtered data, and then the student model is required to maximize the distance with multi-level negative knowledge transferring. Empirical results on the MS COCO benchmark verify that our plug-and-play unlikely negative framework shows a significant performance gain in both accuracy and diversity, compared to previous state-of-the-art distinctive image captioning methods.

List of keywords

Computer Vision -> CV: Vision and language
Machine Learning -> ML: Learning preferences or rankings

1938

RAIN: RegulArization on Input and Network for Black-Box Domain Adaptation

Qucheng Peng, Zhengming Ding, Lingjuan Lyu, Lichao Sun, Chen Chen

[+] More

[-] Less

Source-Free domain adaptation transits the source-trained model towards target domain without exposing the source data, trying to dispel these concerns about data privacy and security. However, this paradigm is still at risk of data leakage due to adversarial attacks on the source model. Hence, the Black-Box setting only allows to use the outputs of source model, but still suffers from overfitting on the source domain more severely due to source model’s unseen weights. In this paper, we propose a novel approach named RAIN (RegulArization on Input and Network) for Black-Box domain adaptation from both input-level and network-level regularization. For the input-level, we design a new data augmentation technique as Phase MixUp, which highlights task-relevant objects in the interpolations, thus enhancing input-level regularization and class consistency for target models. For network-level, we develop a Subnetwork Distillation mechanism to transfer knowledge from the target subnetwork to the full target network via knowledge distillation, which thus alleviates overfitting on the source domain by learning diverse target representations. Extensive experiments show that our method achieves state-of-the-art performance on several cross-domain benchmarks under both single- and multi-source black-box domain adaptation.

List of keywords

Machine Learning -> ML: Multi-task and transfer learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

1953

Tracking Different Ant Species: An Unsupervised Domain Adaptation Framework and a Dataset for Multi-object Tracking

Chamath Abeysinghe, Chris Reid, Hamid Rezatofighi, Bernd Meyer

[+] More

[-] Less

Tracking individuals is a vital part of many experiments conducted to understand collective behaviour. Ants are the paradigmatic model system for such experiments but their lack of individually distinguishing visual features and their high colony densities make it extremely difficult to perform reliable tracking automatically. Additionally, the wide diversity of their species’ appearances makes a generalized approach even harder. In this paper, we propose a data-driven multi-object tracker that, for the first time, employs domain adaptation to achieve the required generalisation. This approach is built upon a joint-detection-and-tracking framework that is extended by a set of domain discriminator modules integrating an adversarial training strategy in addition to the tracking loss. In addition to this novel domain-adaptive tracking framework, we present a new dataset and a benchmark for the ant tracking problem. The dataset contains 57 video sequences with full trajectory annotation, including 30k frames captured from two different ant species moving on different background patterns. It comprises 33 and 24 sequences for source and target domains, respectively. We compare our proposed framework against other domain-adaptive and non-domain-adaptive multi-object tracking baselines using this dataset and show that incorporating domain adaptation at multiple levels of the tracking pipeline yields significant improvements. The code and the dataset are available at https://blinded.

List of keywords

Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Applications
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

1957

Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies

Rubens Moraes, David Aleixo, Lucas Nascimento Ferreira, Levi Lelis

[+] More

[-] Less

This paper introduces Local Search (2L), a novel algorithm for providing a set of reference strategies to guide the search for programmatic strategies in two-player zero-sum games. Other learning methods, such as Iterated Best Response (IBR), Fictitious Play (FP), and Double-Oracle (DO), can be computationally expensive or miss important information for guiding search algorithms. 2L actively selects a set of reference strategies to improve the search signal. We empirically demonstrate the advantages of our approach while guiding a local search algorithm for synthesizing strategies in three games, including MicroRTS, a challenging real-time strategy game. Results show that 2L learns reference strategies that provide a stronger search signal than IBR, FP, and DO. We also simulate a tournament of MicroRTS, where a synthesizer using 2L outperformed the winners of the two latest MicroRTS competitions, which were programmatic strategies written by human programmers.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Computer games
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Search -> S: Local search

1966

Inferring Private Valuations from Behavioral Data in Bilateral Sequential Bargaining

Lvye Cui, Haoran Yu

[+] More

[-] Less

Inferring bargainers’ private valuations on items from their decisions is crucial for analyzing their strategic behaviors in bilateral sequential bargaining. Most existing approaches that infer agents’ private information from observable data either rely on strong equilibrium assumptions or require a careful design of agents’ behavior models. To overcome these weaknesses, we propose a Bayesian Learning-based Valuation Inference (BLUE) framework. Our key idea is to derive feasible intervals of bargainers’ private valuations from their behavior data, using the fact that most bargainers do not choose strictly dominated strategies. We leverage these feasible intervals to guide our inference. Specifically, we first model each bargainer’s behavior function (which maps his valuation and bargaining history to decisions) via a recurrent neural network. Second, we learn these behavior functions by utilizing a novel loss function defined based on feasible intervals. Third, we derive the posterior distributions of bargainers’ valuations according to their behavior data and learned behavior functions. Moreover, we account for the heterogeneity of bargainer behaviors, and propose a clustering algorithm (K-Loss) to improve the efficiency of learning these behaviors. Experiments on both synthetic and real bargaining data show that our inference approach outperforms baselines.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Game Theory and Economic Paradigms -> GTEP: Other

1983

XFormer: Fast and Accurate Monocular 3D Body Capture

Lihui Qian, Xintong Han, Faqiang Wang, Hongyu Liu, Haoye Dong, Zhiwen Li, Huawei Wei, zhe lin, Cheng-Bin Jin

[+] More

[-] Less

We present XFormer, a novel human mesh and motion capture method that achieves real-time performance on consumer CPUs given only monocular images as input. The proposed network architecture contains two branches: a keypoint branch that estimates 3D human mesh vertices given 2D keypoints, and an image branch that makes prediction directly from the RGB image features. At the core of our method is a cross-modal transformer block that allows information flow across these two branches by modeling the attention between 2D keypoint coordinates and image spatial features. Our architecture is smartly designed, which enables us to train on various types of datasets including images with 2D/3D annotations, images with 3D pseudo labels, and motion capture datasets that do not have associated images. This effectively improves the accuracy and generalization ability of our system. Built on a lightweight backbone (MobileNetV3), our method runs blazing fast (over 30fps on a single CPU core) and still yields competitive accuracy. Furthermore, with a HRNet backbone, XFormer delivers state-of-the-art performance on Huamn3.6 and 3DPW datasets.

List of keywords

Computer Vision -> CV: 3D computer vision

2002

A Novel Learnable Interpolation Approach for Scale-Arbitrary Image Super-Resolution

Jiahao Chao, Zhou Zhou, Hongfan Gao, Jiali Gong, Zhenbing Zeng, Zhengfeng Yang

[+] More

[-] Less

Deep convolutional neural networks (CNNs) have achieved unprecedented success in single image super-resolution over the past few years. Meanwhile, there is an increasing demand for single image super-resolution with arbitrary scale factors in real-world scenarios. Many approaches adopt scale-specific multi-path learning to cope with multi-scale super-resolution with a single network. However, these methods require a large number of parameters. To achieve a better balance between the reconstruction quality and parameter amounts, we proposes a learnable interpolation method that leverages the advantages of neural networks and interpolation methods to tackle the scale-arbitrary super-resolution task. The scale factor is treated as a function parameter for generating the kernel weights for the learnable interpolation. We demonstrate that the learnable interpolation builds a bridge between neural networks and traditional interpolation methods. Experiments show that the proposed learnable interpolation requires much fewer parameters and outperforms state-of-the-art super-resolution methods.

List of keywords

Computer Vision -> CV: Other
Machine Learning -> ML: Convolutional networks

2005

Linear Query Approximation Algorithms for Non-monotone Submodular Maximization under Knapsack Constraint

Canh Pham, Tan Tran, Dung Ha, My T. Thai

[+] More

[-] Less

This work, for the first time, introduces two constant factor approximation algorithms with linear query complexity for non-monotone submodular maximization over a ground set of size $n$ subject to a knapsack constraint, $\DLA$ and $\RLA$. $\DLA$ is a deterministic algorithm that provides an approximation factor of $6+\epsilon$ while $\RLA$ is a randomized algorithm with an approximation factor of $4+\epsilon$. Both run in $O(n \log(1/\epsilon)/\epsilon)$ query complexity. The key idea to obtain a constant approximation ratio with linear query lies in: (1) dividing the ground set into two appropriate subsets to find the near-optimal solution over these subsets with linear queries; and (2) combining a threshold greedy with properties of two disjoint sets or a random selection process to improve solution quality. In addition to the theoretical analysis, we have evaluated our proposed solutions with three applications: Revenue Maximization, Image Summarization, and Maximum Weighted Cut, showing that our algorithms not only return comparative results to state-of-the-art algorithms but also require significantly fewer queries.

List of keywords

Machine Learning -> ML: Optimization
Constraint Satisfaction and Optimization -> CSO: Constraint optimization

2012

BARA: Efficient Incentive Mechanism with Online Reward Budget Allocation in Cross-Silo Federated Learning

Yunchao Yang, Yipeng Zhou, Miao Hu, Di Wu, Quan Z. Sheng

[+] More

[-] Less

Federated learning (FL) is a prospective distributed machine learning framework that can preserve data privacy. In particular, cross-silo FL can complete model training by making isolated data islands of different organizations collaborate with a parameter server (PS) via exchanging model parameters for multiple communication rounds. In cross-silo FL, an incentive mechanism is indispensable for motivating data owners to contribute their models to FL training. However, how to allocate the reward budget among different rounds is an essential but complicated problem largely overlooked by existing works. The challenge of this problem lies in the opaque feedback between reward budget allocation and model utility improvement of FL, making the optimal reward budget allocation complicated. To address this problem, we design an online reward budget allocation algorithm using Bayesian optimization named BARA (\underline{B}udget \underline{A}llocation for \underline{R}everse \underline{A}uction). Specifically, BARA can model the complicated relationship between reward budget allocation and final model accuracy in FL based on historical training records so that the reward budget allocated to each communication round is dynamically optimized so as to maximize the final model utility. We further incorporate the BARA algorithm into reverse auction-based incentive mechanisms to illustrate its effectiveness. Extensive experiments are conducted on real datasets to demonstrate that BARA significantly outperforms competitive baselines by improving model utility with the same amount of reward budget.

List of keywords

Machine Learning -> ML: Federated learning

2015

Reinforcement Learning Approaches for Traffic Signal Control under Missing Data

hao mei, Junxian Li, Bin Shi, Hua Wei

[+] More

[-] Less

The emergence of reinforcement learning (RL) methods in traffic signal control tasks have achieved better performance than conventional rule-based approaches. Most RL approaches require the observation of the environment for the agent to decide which action is optimal for a long-term reward. However, in real-world urban scenarios, missing observation of traffic states may frequently occur due to the lack of sensors, which makes existing RL methods inapplicable on road networks with missing observation. In this work, we aim to control the traffic signals in a real-world setting, where some of the intersections in the road network are not installed with sensors and thus with no direct observations around them. To the best of our knowledge, we are the first to tackle the traffic signal control problem in this real-world setting. Specifically, we propose two solutions: the first one imputes the traffic states to enable adaptive control, and the second one imputes both states and rewards to enable adaptive control and the training of RL agents. Through extensive experiments on both synthetic and real-world road network traffic, we reveal that our method outperforms conventional approaches and performs consistently with different missing rates. We also provide further investigations on how missing data influences the performance of our model.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Reinforcement learning

2038

LION: Label Disambiguation for Semi-supervised Facial Expression Recognition with Progressive Negative Learning

Zhongjing Du, Xu Jiang, Peng Wang, Zhou Qizheng, Xi Wu, Jiliu Zhou, Yan Wang

[+] More

[-] Less

Semi-supervised deep facial expression recognition (SS-DFER) has recently attracted rising research interest due to its more practical setting of abundant unlabeled data. However, there are two main problems unconsidered in current SS-DFER methods: 1) label ambiguity, i.e., given labels mismatch with facial expressions; 2) inefficient utilization of unlabeled data with low-confidence. In this paper, we propose a novel SS-DFER method, including a Label DIsambiguation module and a PrOgressive Negative Learning module, namely LION, to simultaneously address both problems. Specifically, the label disambiguation module operates on labeled data, including data with accurate labels (clear data) and ambiguous labels (ambiguous data). It first uses clear data to calculate prototypes for all the expression classes, and then re-assign a candidate label set to all the ambiguous data. Based on the prototypes and the candidate label set, the ambiguous data can be relabeled more accurately. As for unlabeled data with low-confidence, the progressive negative learning module is developed to iteratively mine more complete complementary labels, which can guide the model to reduce the association between data and corresponding complementary labels. Experiments on three challenging datasets show that our method significantly outperforms the current state-of-the-art approaches in SS-DFER and surpasses fully-supervised baselines. Code will be available at https://github.com/NUM-7/LION.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Applications

2045

FedDWA: Personalized Federated Learning with Dynamic Weight Adjustment

JiaHao Liu, Jiang Wu, Jinyu Chen, Miao Hu, Yipeng Zhou, Di Wu

[+] More

[-] Less

Different from conventional federated learning, personalized federated learning (PFL) is able to train a customized model for each individual client according to its unique requirement. The mainstream approach is to adopt a kind of weighted aggregation method to generate personalized models, in which weights are determined by the loss value or model parameters among different clients. However, such kinds of methods require clients to download others’ models. It not only sheer increases communication traffic but also potentially infringes data privacy. In this paper, we propose a new PFL algorithm called FedDWA (Federated Learning with Dynamic Weight Adjustment) to address the above problem, which leverages the parameter server (PS) to compute personalized aggregation weights based on collected models from clients. In this way, FedDWA can capture similarities between clients with much less communication overhead. More specifically, we formulate the PFL problem as an optimization problem by minimizing the distance between personalized models and guidance models, so as to customize aggregation weights for each client. Guidance models are obtained by the local one-step ahead adaptation on individual clients. Finally, we conduct extensive experiments using five real datasets and the results demonstrate that FedDWA can significantly reduce the communication traffic and achieve much higher model accuracy than the state-of-the-art approaches.

List of keywords

Machine Learning -> ML: Federated learning

2048

Video Diffusion Models with Local-Global Context Guidance

Siyuan Yang, Lu Zhang, Yu Liu, Zhizhuo Jiang, You He

[+] More

[-] Less

Diffusion models have emerged as a powerful paradigm in video synthesis tasks including prediction, generation, and interpolation. Due to the limitation of the computational budget, existing methods usually implement conditional diffusion models with an autoregressive inference pipeline, in which the future fragment is predicted based on the distribution of adjacent past frames. However, only the conditions from a few previous frames can’t capture the global temporal coherence, leading to inconsistent or even outrageous results in long-term video prediction. In this paper, we propose a Local-Global Context guided Video Diffusion model (LGC-VD) to capture multi-perception conditions for producing high-quality videos in both conditional/unconditional settings. In LGC-VD, the UNet is implemented with stacked residual blocks with self-attention units, avoiding the undesirable computational cost in 3D Conv. We construct a local-global context guidance strategy to capture the multi-perceptual embedding of the past fragment to boost the consistency of future prediction. Furthermore, we propose a two-stage training strategy to alleviate the effect of noisy frames for more stable predictions. Our experiments demonstrate that the proposed method achieves favorable performance on video prediction, interpolation, and unconditional video generation.

List of keywords

Computer Vision -> CV: Neural generative models, auto encoders, GANs
Computer Vision -> CV: Video analysis and understanding

2051

Continuous-Time Graph Learning for Cascade Popularity Prediction

Xiaodong Lu, Shuo Ji, Le Yu, Leilei Sun, Bowen Du, Tongyu Zhu

[+] More

[-] Less

Information propagation on social networks could be modeled as cascades, and many efforts have been made to predict the future popularity of cascades. However, most of the existing research treats a cascade as an individual sequence. Actually, the cascades might be correlated with each other due to the shared users or similar topics. Moreover, the preferences of users and semantics of a cascade are usually continuously evolving over time. In this paper, we propose a continuous-time graph learning method for cascade popularity prediction, which first connects different cascades via a universal sequence of user-cascade and user-user interactions and then chronologically learns on the sequence by maintaining the dynamic states of users and cascades. Specifically, for each interaction, we present an evolution learning module to continuously update the dynamic states of the related users and cascade based on their currently encoded messages and previous dynamic states. We also devise a cascade representation learning component to embed the temporal information and structural information carried by the cascade. Experiments on real-world datasets demonstrate the superiority and rationality of our approach.

List of keywords

Data Mining -> DM: Mining text, web, social media
Data Mining -> DM: Mining spatial and/or temporal data

2072

Learnable Surrogate Gradient for Direct Training Spiking Neural Networks

Shuang Lian, Jiangrong Shen, Qianhui Liu, Ziming Wang, Rui Yan, Huajin Tang

[+] More

[-] Less

Spiking neural networks (SNNs) have increasingly drawn massive research attention due to biological interpretability and efficient computation. Recent achievements are devoted to utilizing the surrogate gradient (SG) method to avoid the dilemma of non-differentiability of spiking activity to directly train SNNs by backpropagation. However, the fixed width of the SG leads to gradient vanishing and mismatch problems, thus limiting the performance of directly trained SNNs. In this work, we propose a novel perspective to unlock the width limitation of SG, called the learnable surrogate gradient (LSG) method. The LSG method modulates the width of SG according to the change of the distribution of the membrane potentials, which is identified to be related to the decay factors based on our theoretical analysis. Then we introduce the trainable decay factors to implement the LSG method, which can optimize the width of SG automatically during training to avoid the gradient vanishing and mismatch problems caused by the limited width of SG. We evaluate the proposed LSG method on both image and neuromorphic datasets. Experimental results show that the LSG method can effectively alleviate the blocking of gradient propagation caused by the limited width of SG when training deep SNNs directly. Meanwhile, the LSG method can help SNNs achieve competitive performance on both latency and accuracy.

List of keywords

Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Brain sciences
Humans and AI -> HAI: Cognitive systems

2077

Fluid Dynamics-Inspired Network for Infrared Small Target Detection

Tianxiang Chen, Qi Chu, Bin Liu, Nenghai Yu

[+] More

[-] Less

Most infrared small target detection (ISTD) networks focus on building effective neural blocks or feature fusion modules but none describes the ISTD process from the image evolution perspective. The directional evolution of image pixels influenced by convolution, pooling and surrounding pixels is analogous to the movement of fluid elements constrained by surrounding variables ang particles. Inspired by this, we explore a novel research routine by abstracting the movement of pixels in the ISTD process as the flow of fluid in fluid dynamics (FD). Specifically, a new Fluid Dynamics-Inspired Network (FDI-Net) is devised for ISTD. Based on Taylor Central Difference (TCD) method, the TCD feature extraction block is designed, where convolution and Transformer structures are combined for local and global information. The pixel motion equation during the ISTD process is derived from the Navier–Stokes (N-S) equation, constructing a N-S Refinement Module that refines extracted features with edge details. Thus, the TCD feature extraction block determines the primary movement direction of pixels during detection, while the N-S Refinement Module corrects some skewed directions of the pixel stream to supplement the edge details. Experiments on IRSTD-1k and SIRST demonstrate that our method achieves SOTA performance in terms of evaluation metrics.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Recognition (object detection, categorization)

2094

Video Frame Interpolation with Densely Queried Bilateral Correlation

Chang Zhou, Jie Liu, Jie Tang, Gangshan Wu

[+] More

[-] Less

Video Frame Interpolation (VFI) aims to synthesize non-existent intermediate frames between existent frames. Flow-based VFI algorithms estimate intermediate motion fields to warp the existent frames. Real-world motions’ complexity and the reference frame’s absence make motion estimation challenging. Many state-of-the-art approaches explicitly model the correlations between two neighboring frames for more accurate motion estimation. In common approaches, the receptive field of correlation modeling at higher resolution depends on the motion fields estimated beforehand. Such receptive field dependency makes common motion estimation approaches poor at coping with small and fast-moving objects. To better model correlations and to produce more accurate motion fields, we propose the Densely Queried Bilateral Correlation (DQBC) that gets rid of the receptive field dependency problem and thus is more friendly to small and fast-moving objects. The motion fields generated with the help of DQBC are further refined and up-sampled with context features. After the motion fields are fixed, a CNN-based SynthNet synthesizes the final interpolated frame. Experiments show that our approach enjoys higher accuracy and less inference time than the state-of-the-art. Source code is available at https://github.com/kinoud/DQBC.

List of keywords

Computer Vision -> CV: Computational photography
Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Video analysis and understanding

2099

Robust Reinforcement Learning via Progressive Task Sequence

Yike Li, Yunzhe Tian, Endong Tong, Wenjia Niu, Jiqiang Liu

[+] More

[-] Less

Robust reinforcement learning (RL) has been a challenging problem due to the gap between simulation and the real world. Existing efforts typically address the robust RL problem by solving a max-min problem. The main idea is to maximize the cumulative reward under the worst-possible perturbations. However, the worst-case optimization either leads to overly conservative solutions or unstable training process, which further affects the policy robustness and generalization performance. In this paper, we tackle this problem from both formulation definition and algorithm design. First, we formulate the robust RL as a max-expectation optimization problem, where the goal is to find an optimal policy under both the worst cases and the non-worst cases. Then, we propose a novel framework DRRL to solve the max-expectation optimization. Given our definition of the feasible tasks, a task generation and sequencing mechanism is introduced to dynamically output tasks at appropriate difficulty level for the current policy. With these progressive tasks, DRRL realizes dynamic multi-task learning to improve the policy robustness and the training stability. Finally, extensive experiments demonstrate that the proposed method exhibits significant performance on the unmanned CarRacing game and multiple high-dimensional MuJoCo environments.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Agent-based and Multi-agent Systems -> MAS: Agent theories and models

2103

Explainable Reinforcement Learning via a Causal World Model

Zhongwei Yu, Jingqing Ruan, xing dengpeng

[+] More

[-] Less

Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.

List of keywords

Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Causality
Machine Learning -> ML: Reinforcement learning

2106

Handling Learnwares Developed from Heterogeneous Feature Spaces without Auxiliary Data

Peng Tan, Zhi-Hao Tan, Yuan Jiang, Zhi-Hua Zhou

[+] More

[-] Less

The learnware paradigm proposed by Zhou [2016] devotes to providing a modelsharing platform where users can solve their problems by reusing existing efforts instead of starting from scratch. A learnware consists of a wellperformed trained model and the specification which enables the model to be adequately identified according to the user’s requirement. The previous studies concentrated on the homogeneous case where models share the same feature space. However, in realworld scenarios, models are usually coming from different feature spaces. If the learnware market can handle heterogeneous feature spaces, all wellperformed models built for a particular task with arbitrary feature spaces can be submitted to the market, and the market can accommodate them well and identify the helpful model for the user even if the feature space is not aligned. This problem would be easier with the help of extra auxiliary data connecting all different feature spaces, but such data can hardly be obtained in reality. In this paper, we provide a general framework for accommodating heterogeneous learnwares without using any auxiliary data. The key idea is to exploit the specifications to construct the relationship between different feature spaces. Furthermore, we give a matrix factorizationbased implementation of the overall procedure for constructing and exploiting the heterogeneous learnware market. Experiments on realworld tasks validate the efficacy of our method.

List of keywords

Machine Learning -> ML: Classification
Machine Learning -> ML: Automated machine learning
Machine Learning -> ML: Multi-task and transfer learning

2118

Hawkes Process Based on Controlled Differential Equations

Minju Jo, Seungji Kook, Noseong Park

[+] More

[-] Less

Hawkes processes are a popular framework to model the occurrence of sequential events, i.e., occurrence dynamics, in several fields such as social diffusion. In real-world scenarios, the inter-arrival time among events is irregular. However, existing neural network-based Hawkes process models not only i) fail to capture such complicated irregular dynamics, but also ii) resort to heuristics to calculate the log-likelihood of events since they are mostly based on neural networks designed for regular discrete inputs. To this end, we present the concept of Hawkes process based on controlled differential equations (HP-CDE), by adopting the neural controlled differential equation (neural CDE) technology which is an analogue to continuous RNNs. Since HP-CDE continuously reads data, i) irregular time-series datasets can be properly treated preserving their uneven temporal spaces, and ii) the log-likelihood can be exactly computed. Moreover, as both Hawkes processes and neural CDEs are first developed to model complicated human behavioral dynamics, neural CDE-based Hawkes processes are successful in modeling such occurrence dynamics. In our experiments with 4 real-world datasets, our method outperforms existing methods by non-trivial margins.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Data Mining -> DM: Mining text, web, social media

2120

Truthful Auctions for Automated Bidding in Online Advertising

Yidan Xing, Zhilin Zhang, Zhenzhe Zheng, Chuan Yu, Jian Xu, Fan Wu, Guihai Chen

[+] More

[-] Less

Automated bidding, an emerging intelligent decision-making paradigm powered by machine learning, has become popular in online advertising. Advertisers in automated bidding evaluate the cumulative utilities and have private financial constraints over multiple ad auctions in a long-term period. Based on these distinct features, we consider a new ad auction model for automated bidding: the values of advertisers are public while the financial constraints, such as budget and return on investment (ROI) rate, are private types. We derive the truthfulness conditions with respect to private constraints for this multi-dimensional setting, and demonstrate any feasible allocation rule could be equivalently reduced to a series of non-decreasing functions on budget. However, the resulted allocation mapped from these non-decreasing functions generally follows an irregular shape, making it difficult to obtain a closed-form expression for the auction objective. To overcome this design difficulty, we propose a family of truthful automated bidding auction with personalized rank scores, similar to the Generalized Second-Price (GSP) auction. The intuition behind our design is to leverage personalized rank scores as the criteria to allocate items, and compute a critical ROI to transforms the constraints on budget to the same dimension as ROI. The experimental results demonstrate that the proposed auction mechanism outperforms the widely used ad auctions, such as first-price auction and second-price auction, in various automated bidding environments.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Mechanism design

2144

Stochastic Feature Averaging for Learning with Long-Tailed Noisy Labels

Hao-Tian Li, Tong Wei, Hao Yang, Kun Hu, Chong Peng, Li-Bo Sun, Xun-Liang Cai, Min-Ling Zhang

[+] More

[-] Less

Deep neural networks have shown promising results on a wide variety of tasks using large-scale and well-annotated training datasets. However, data collected from real-world applications can suffer from two prevalent biases, i.e., long-tailed class distribution and label noise. Previous efforts on long-tailed learning and label-noise learning can only address a single type of data bias, leading to a severe deterioration of their performance. In this paper, we propose a distance-based sample selection algorithm called Stochastic Feature Averaging (SFA), which fits a Gaussian using the exponential running average of class centroids to capture uncertainty in representation space due to label noise and data scarcity. With SFA, we detect noisy samples based on their distances to class centroids sampled from this Gaussian distribution. Based on the identified clean samples, we then propose to train an auxiliary balanced classifier to improve the generalization for the minority class and facilitate the update of Gaussian parameters. Extensive experimental results show that SFA can enhance the performance of existing methods on both simulated and real-world datasets. Further, we propose to combine SFA with the sample-selection approach, distribution-robust, and noise-robust loss functions, resulting in significant improvement in performance over the baselines. Our code is available at https://github.com/HotanLee/SFA

List of keywords

Machine Learning -> ML: Weakly supervised learning
Machine Learning -> ML: Multi-label
Machine Learning -> ML: Semi-supervised learning

2147

On the Complexity of Counterfactual Reasoning

Yunqiu Han, Yizuo Chen, Adnan Darwiche

[+] More

[-] Less

We study the computational complexity of counterfactual reasoning in relation to the complexity of associational and interventional reasoning on structural causal models (SCMs). We show that counterfactual reasoning is no harder than associational or interventional reasoning on fully specified SCMs in the context of two computational frameworks. The first framework is based on the notion of treewidth and includes the classical variable elimination and jointree algorithms. The second framework is based on the more recent and refined notion of causal treewidth which is directed towards models with functional dependencies such as SCMs. Our results are constructive and based on bounding the (causal) treewidth of twin networks—used in standard counterfactual reasoning that contemplates two worlds, real and imaginary—to the (causal) treewidth of the underlying SCM structure. In particular, we show that the latter (causal) treewidth is no more than twice the former plus one. Hence, if associational or interventional reasoning is tractable on a fully specified SCM then counterfactual reasoning is tractable too. We extend our results to general counterfactual reasoning that requires contemplating more than two worlds and discuss applications of our results to counterfactual reasoning with partially specified SCMs that are coupled with data. We finally present empirical results that measure the gap between the complexities of counterfactual reasoning and associational/interventional reasoning on random SCMs.

List of keywords

Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
Knowledge Representation and Reasoning -> KRR: Causality
Uncertainty in AI -> UAI: Bayesian networks

2160

Manifold-Aware Self-Training for Unsupervised Domain Adaptation on Regressing 6D Object Pose

Yichen Zhang, Jiehong Lin, Ke Chen, Zelin Xu, Yaowei Wang, Kui Jia

[+] More

[-] Less

Domain gap between synthetic and real data in visual regression (e.g., 6D pose estimation) is bridged in this paper via global feature alignment and local refinement on the coarse classification of discretized anchor classes in target space, which imposes a piece-wise target manifold regularization into domain-invariant representation learning. Specifically, our method incorporates an explicit self-supervised manifold regularization, revealing consistent cumulative target dependency across domains, to a self-training scheme (e.g., the popular Self-Paced Self-Training) to encourage more discriminative transferable representations of regression tasks. Moreover, learning unified implicit neural functions to estimate relative direction and distance of targets to their nearest class bins aims to refine target classification predictions, which can gain robust performance against inconsistent feature scaling sensitive to UDA regressors. Experiment results on three public benchmarks of the challenging 6D pose estimation task can verify the effectiveness of our method, consistently achieving superior performance to the state-of-the-art for UDA on 6D pose estimation. Codes and pre-trained models are available https://github.com/Gorilla-Lab-SCUT/MAST.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

2178

On the Study of Curriculum Learning for Inferring Dispatching Policies on the Job Shop Scheduling

Zangir Iklassov, Dmitrii Medvedev, Ruben Solozabal Ochoa de Retana, Martin Takac

[+] More

[-] Less

This paper studies the use of Curriculum Learning on Reinforcement Learning (RL) to improve the performance of the dispatching policies learned on the Job-shop Scheduling Problem (JSP). Current works in the literature present a large optimality gap when learning end-to-end solutions on this problem. In this regard, we identify the difficulty for RL to learn directly on large instances as part of the issue and use Curriculum Learning (CL) to mitigate this effect. Particularly, CL sequences the learning process in a curriculum of increasing complexity tasks, which allows learning on large instances that otherwise would be impossible to learn from scratch. In this paper, we present a size-agnostic model that enables us to demonstrate that current curriculum strategies have a major impact on the quality of the solution inferred. In addition, we introduce a novel Reinforced Adaptive Staircase Curriculum Learning (RASCL) strategy, which adjusts the difficulty level during the learning process by revisiting the worst-performing instances. Conducted experiments on Taillard’s and Demirkol’s datasets show that the presented approach significantly improves the current stateof-the-art models on the JSP. It reduces the average optimality gap from 19.35% to 10.46% on Taillard’s instances and from 38.43% to 18.85% on Demirkol’s instances.

List of keywords

Planning and Scheduling -> PS: Learning in planning and scheduling
Planning and Scheduling -> PS: Scheduling

2184

Exploring Safety Supervision for Continual Test-time Domain Adaptation

Xu Yang, Yanan Gu, Kun Wei, Cheng Deng

[+] More

[-] Less

Continual test-time domain adaptation aims to adapt a source pre-trained model to a continually changing target domain without using any source data. Unfortunately, existing methods based on pseudo-label learning suffer from the changing target domain environment, and the quality of generated pseudo-labels is attenuated due to the domain shift, leading to instantaneous negative learning and long-term knowledge forgetting. To solve these problems, in this paper, we propose a simple yet effective framework for exploring safety supervision with three elaborate strategies: Label Safety, Sample Safety, and Parameter Safety. Firstly, to select reliable pseudo-labels, we define and adjust the confidence threshold in a self-adaptive manner according to the test-time learning status. Secondly, a soft-weighted contrastive learning module is presented to explore the highly-correlated samples and discriminate uncorrelated ones, improving the instantaneous efficiency of the model. Finally, we frame a Soft Weight Alignment strategy to normalize the distance between the parameters of the adapted model and the source pre-trained model, which alleviates the long-term problem of knowledge forgetting and significantly improves the accuracy of the adapted model in the late adaptation stage. Extensive experimental results demonstrate that our method achieves state-of-the-art performance on several benchmark datasets.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Representation learning

2193

Structural Hawkes Processes for Learning Causal Structure from Discrete-Time Event Sequences

Jie Qiao, Ruichu Cai, Siyu Wu, Yu Xiang, Keli Zhang, Zhifeng Hao

[+] More

[-] Less

Learning causal structure among event types from discrete-time event sequences is a particularly important but challenging task. Existing methods, such as the multivariate Hawkes processes based methods, mostly boil down to learning the so-called Granger causality which assumes that the cause event happens strictly prior to its effect event. Such an assumption is often untenable beyond applications, especially when dealing with discrete-time event sequences in low-resolution; and typical discrete Hawkes processes mainly suffer from identifiability issues raised by the instantaneous effect, i.e., the causal relationship that occurred simultaneously due to the low-resolution data will not be captured by Granger causality. In this work, we propose Structure Hawkes Processes (SHPs) that leverage the instantaneous effect for learning the causal structure among events type in discrete-time event sequence. The proposed method is featured with the Expectation-Maximization of the likelihood function and a sparse optimization scheme. Theoretical results show that the instantaneous effect is a blessing rather than a curse, and the causal structure is identifiable under the existence of the instantaneous effect. Experiments on synthetic and real-world data verify the effectiveness of the proposed method.

List of keywords

Uncertainty in AI -> UAI: Causality, structural causal models and causal inference

2225

Privacy-Preserving End-to-End Spoken Language Understanding

Yinggui Wang, Wei Huang, Le Yang

[+] More

[-] Less

Spoken language understanding (SLU), one of the key enabling technologies for human-computer interaction in IoT devices, provides an easy-to-use user interface. Human speech can contain a lot of user-sensitive information, such as gender, identity, and sensitive content. New types of security and privacy breaches have thus emerged. Users do not want to expose their personal sensitive information to malicious attacks by untrusted third parties. Thus, the SLU system needs to ensure that a potential malicious attacker cannot deduce the sensitive attributes of the users, while it should avoid greatly compromising the SLU accuracy. To address the above challenge, this paper proposes a novel SLU multi-task privacy-preserving model to prevent both the speech recognition (ASR) and identity recognition (IR) attacks. The model uses the hidden layer separation technique so that SLU information is distributed only in a specific portion of the hidden layer, and the other two types of information are removed to obtain a privacy-secure hidden layer. In order to achieve good balance between efficiency and privacy, we introduce a new mechanism of model pre-training, namely joint adversarial training, to further enhance the user privacy. Experiments over two SLU datasets show that the proposed method can reduce the accuracy of both the ASR and IR attacks close to that of a random guess, while leaving the SLU performance largely unaffected.

List of keywords

Natural Language Processing -> NLP: Dialogue and interactive systems
Machine Learning -> ML: Adversarial machine learning
Natural Language Processing -> NLP: Speech

2228

CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation

Tao Lei, Rui Sun, Xuan Wang, Yingbo Wang, Xi He, Asoke Nandi

[+] More

[-] Less

The hybrid architecture of convolutional neural networks (CNNs) and Transformer are very popular for medical image segmentation. However, it suffers from two challenges. First, although a CNNs branch can capture the local image features using vanilla convolution, it cannot achieve adaptive feature learning. Second, although a Transformer branch can capture the global features, it ignores the channel and cross-dimensional self-attention, resulting in a low segmentation accuracy on complex-content images. To address these challenges, we propose a novel hybrid architecture of convolutional neural networks hand in hand with vision Transformers (CiT-Net) for medical image segmentation. Our network has two advantages. First, we design a dynamic deformable convolution and apply it to the CNNs branch, which overcomes the weak feature extraction ability due to fixed-size convolution kernels and the stiff design of sharing kernel parameters among different inputs. Second, we design a shifted-window adaptive complementary attention module and a compact convolutional projection. We apply them to the Transformer branch to learn the cross-dimensional long-term dependency for medical images. Experimental results show that our CiT-Net provides better medical image segmentation results than popular SOTA methods. Besides, our CiT-Net requires lower parameters and less computational costs and does not rely on pre-training. The code is publicly available at https://github.com/SR0920/CiT-Net.

List of keywords

Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Machine learning for vision

2229

On Lower Bounds for Maximin Share Guarantees

Halvard Hummel

[+] More

[-] Less

We study the problem of fairly allocating a set of indivisible items to a set of agents with additive valuations. Recently, Feige et al. (WINE’21) proved that a maximin share (MMS) allocation exists for all instances with n agents and no more than n + 5 items. Moreover, they proved that an MMS allocation is not guaranteed to exist for instances with 3 agents and at least 9 items, or n ≥ 4 agents and at least 3n + 3 items. In this work, we shrink the gap between these upper and lower bounds for guaranteed existence of MMS allocations. We prove that for any integer c > 0, there exists a number of agents n_c such that an MMS allocation exists for any instance with n ≥ n_c agents and at most n + c items, where n_c ≤ ⌊0.6597^c · c!⌋ for allocation of goods and n_c ≤ ⌊0.7838^c · c!⌋ for chores. Furthermore, we show that for n ≠ 3 agents, all instances with n + 6 goods have an MMS allocation.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division

2230

Optimal Decision Trees For Interpretable Clustering with Constraints

Pouya Shati, Eldan Cohen, Sheila McIlraith

[+] More

[-] Less

Constrained clustering is a semi-supervised task that employs a limited amount of labelled data, formulated as constraints, to incorporate domain-specific knowledge and to significantly improve clustering accuracy. Previous work has considered exact optimization formulations that can guarantee optimal clustering while satisfying all constraints, however these approaches lack interpretability. Recently, decision-trees have been used to produce inherently interpretable clustering solutions, however existing approaches do not support clustering constraints and do not provide strong theoretical guarantees on solution quality. In this work, we present a novel SAT-based framework for interpretable clustering that supports clustering constraints and that also provides strong theoretical guarantees on solution quality. We also present new insight into the trade-off between interpretability and satisfaction of such user-provided constraints. Our framework is the first approach for interpretable and constrained clustering. Experiments with a range of real-world and synthetic datasets demonstrate that our approach can produce high-quality and interpretable constrained clustering solutions.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Machine Learning -> ML: Clustering

2241

Uncovering the Largest Community in Social Networks at Scale

Shohei Matsugu, Yasuhiro Fujiwara, Hiroaki Shiokawa

[+] More

[-] Less

The Maximum k-Plex Search (MPS) can find the largest k-plex, which is a generalization of the largest clique. Although MPS is commonly used in AI to effectively discover real-world communities of social networks, existing MPS algorithms suffer from high computational costs because they iteratively scan numerous nodes to find the largest k-plex. Here, we present an efficient MPS algorithm called Branch-and-Merge (BnM), which outputs an exact maximum k-plex. BnM merges unnecessary nodes to explore a smaller graph than the original one. Extensive evaluations on real-world social networks demonstrate that BnM significantly outperforms other state-of-the-art MPS algorithms in terms of running time.

List of keywords

Data Mining -> DM: Mining text, web, social media
Data Mining -> DM: Applications

2250

DeLELSTM: Decomposition-based Linear Explainable LSTM to Capture Instantaneous and Long-term Effects in Time Series

Chaoqun Wang, Yijun Li, Xiangqian Sun, Qi Wu, Dongdong Wang, Zhixiang Huang

[+] More

[-] Less

Time series forecasting is prevalent in various real-world applications. Despite the promising results of deep learning models in time series forecasting, especially the Recurrent Neural Networks (RNNs), the explanations of time series models, which are critical in high-stakes applications, have received little attention. In this paper, we propose a Decomposition-based Linear Explainable LSTM (DeLELSTM) to improve the interpretability of LSTM. Conventionally, the interpretability of RNNs only concentrates on the variable importance and time importance. We additionally distinguish between the instantaneous influence of new coming data and the long-term effects of historical data. Specifically, DeLELSTM consists of two components, i.e., standard LSTM and tensorized LSTM. The tensorized LSTM assigns each variable with a unique hidden state making up a matrix $(h_t)$, and the standard LSTM models all the variables with a shared hidden state (H_t). By decomposing the $(H_t)$ into the linear combination of past information (h_{t-1}) and the fresh information (h_{t}-h_{t-1}), we can get the instantaneous influence and the long-term effect of each feature. In addition, the advantage of linear regression also makes the explanation transparent and clear. We demonstrate the effectiveness and interpretability of DeLELSTM on three empirical datasets. Extensive experiments show that the proposed method achieves competitive performance against the baseline methods and provides a reliable explanation relative to domain knowledge.

List of keywords

Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Time series and data streams

2261

Contour-based Interactive Segmentation

Polina Popenova, Danil Galeev, Anna Vorontsova, Anton Konushin

[+] More

[-] Less

Recent advances in interactive segmentation (IS) allow speeding up and simplifying image editing and labeling greatly. The majority of modern IS approaches accept user input in the form of clicks. However, using clicks may require too many user interactions, especially when selecting small ob- jects, minor parts of an object, or a group of ob- jects of the same type. In this paper, we consider such a natural form of user interaction as a loose contour, and introduce a contour-based IS method. We evaluate the proposed method on the standard segmentation benchmarks, our novel UserContours dataset, and its subset UserContours-G containing difficult segmentation cases. Through experiments, we demonstrate that a single contour provides the same accuracy as multiple clicks, thus reducing the required amount of user interactions.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Machine learning for vision

2271

VecoCare: Visit Sequences-Clinical Notes Joint Learning for Diagnosis Prediction in Healthcare Data

Yongxin Xu, Kai Yang, Chaohe Zhang, Peinie Zou, Zhiyuan Wang, Hongxin Ding, Junfeng Zhao, Yasha Wang, Bing Xie

[+] More

[-] Less

Due to the insufficiency of electronic health records (EHR) data utilized in practical diagnosis prediction scenarios, most works are devoted to learning powerful patient representations either from structured EHR data (e.g., temporal medical events, lab test results, etc.) or unstructured data (e.g., clinical notes, etc.). However, synthesizing rich information from both of them still needs to be explored. Firstly, the heterogeneous semantic biases across them heavily hinder the synthesis of representation spaces, which is critical for diagnosis prediction. Secondly, the intermingled quality of partial clinical notes leads to inadequate representations of to-be-predicted patients. Thirdly, typical attention mechanisms mainly focus on aggregating information from similar patients, ignoring important auxiliary information from others. To tackle these challenges, we propose a novel visit sequences-clinical notes joint learning approach, dubbed VecoCare. It performs a Gromov-Wasserstein Distance (GWD)-based contrastive learning task and an adaptive masked language model task in a sequential pre-training manner to reduce heterogeneous semantic biases. After pre-training, VecoCare further aggregates information from both similar and dissimilar patients through a dual-channel retrieval mechanism. We conduct diagnosis prediction experiments on two real-world datasets, which indicates that VecoCare outperforms state-of-the-art approaches. Moreover, the findings discovered by VecoCare are consistent with the medical researches.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Health and medicine
Machine Learning -> ML: Representation learning

2272

Adaptive Estimation Q-learning with Uncertainty and Familiarity

Xiaoyu Gong, Shuai Lü, Jiayu Yu, Sheng Zhu, Zongze Li

[+] More

[-] Less

One of the key problems in model-free deep reinforcement learning is how to obtain more accurate value estimations. Current most widely-used off-policy algorithms suffer from over- or underestimation bias which may lead to unstable policy. In this paper, we propose a novel method, Adaptive Estimation Q-learning (AEQ), which uses uncertainty and familiarity to control the value estimation naturally and can adaptively change for specific state-action pair. We theoretically prove the property of our familiarity term which can even keep the expected estimation bias approximate to 0, and experimentally demonstrate our dynamic estimation can improve the performance and prevent the bias continuously increasing. We evaluate AEQ on several continuous control tasks, outperforming state-of-the-art performance. Moreover, AEQ is simple to implement and can be applied in any off-policy actor-critic algorithm.

List of keywords

Machine Learning -> ML: Deep reinforcement learning
Machine Learning -> ML: Ensemble methods

2276

Commonsense Knowledge Enhanced Sentiment Dependency Graph for Sarcasm Detection

Zhe Yu, Di Jin, Xiaobao Wang, Yawen Li, Longbiao Wang, Jianwu Dang

[+] More

[-] Less

Sarcasm is widely utilized on social media platforms such as Twitter and Reddit. Sarcasm detection is required for analyzing people’s true feelings since sarcasm is commonly used to portray a reversed emotion opposing the literal meaning. The syntactic structure is the key to make better use of commonsense when detecting sarcasm. However, it is extremely challenging to effectively and explicitly explore the information implied in syntactic structure and commonsense simultaneously. In this paper, we apply the pre-trained COMET model to generate relevant commonsense knowledge, and explore a novel scenario of constructing a commonsense-augmented sentiment graph and a commonsense-replaced dependency graph for each text. Based on this, a Commonsense Sentiment Dependency Graph Convolutional Network (CSDGCN) framework is proposed to explicitly depict the role of external commonsense and inconsistent expressions over the context for sarcasm detection by interactively modeling the sentiment and dependency information. Experimental results on several benchmark datasets reveal that our proposed method beats the state-of-the-art methods in sarcasm detection, and has a stronger interpretability.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Knowledge-aided learning
Natural Language Processing -> NLP: Text classification

2296

Complete Instances Mining for Weakly Supervised Instance Segmentation

Zecheng Li

[+] More

[-] Less

Weakly supervised instance segmentation (WSIS) using only image-level labels is a challenging task due to the difficulty of aligning coarse annotations with the finer task. However, with the advancement of deep neural networks (DNNs), WSIS has garnered significant attention. Following a proposal-based paradigm, we encounter a redundant segmentation problem resulting from a single instance being represented by multiple proposals. To address this problem, we propose a novel approach for WSIS that focuses on the online refinement of complete instances through the use of a MaskIoU head to predict the quality of proposals and a Complete Instances Mining (CIM) strategy to explicitly model the redundant segmentation problem and generate refined pseudo labels. Our approach allows the network to become aware of multiple instances and complete instances, and we further improve its robustness through the incorporation of an Anti-noise strategy. Empirical evaluations on the PASCAL VOC 2012 and MS COCO datasets demonstrate that our method achieves state-of-the-art performance with a notable margin.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Segmentation
Machine Learning -> ML: Weakly supervised learning

2309

Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables

Huawen Shen, Xiang Gao, Jin Wei, Liang Qiao, Yu Zhou, Qiang Li, Zhanzhan Cheng

[+] More

[-] Less

Recent advanced Table Structure Recognition (TSR) models adopt image-to-text solutions to parse table structure. These methods can be formulated as image caption problem, i.e., input a single-table image and output table structure description in a specific text format, e.g., HTML. With the impressive success of Transformer in text generation tasks, these methods use Transformer architecture to predict HTML table text in an autoregressive manner. However, tables always emerge with a large variety of shapes and sizes. Autoregressive models usually suffer from the error accumulation problem as the length of predicted text increases, which results in unsatisfactory performance for large tables. In this paper, we propose a novel image-to-text based TSR method that relieves error accumulation problems and improves performance noticeably. At the core of our method is a cascaded two-step decoder architecture with the former decoder predicting HTML table row tags non-autoregressively and the latter predicting HTML table cell tags of each row in a semi-autoregressive manner. Compared with existing methods that predict HTML text autoregressively, the superiority of our row-to-cell progressive table parsing is twofold: (1) it generates an HTML tag sequence with a vertical-and-horizontal two-step `scanning’, which better fits the inherent 2D structure of image data, (2) it performs substantially better for large tables (long sequence prediction) since it alleviates error accumulation problem specific to autoregressive models. Extensive experiments demonstrate that our method achieves competitive performance on three public benchmarks.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Applications

2321

Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network for Spatial-Temporal Action Localization

Jun Yu, Yingshuai Zheng, Shulan Ruan, Qi Liu, Zhiyuan Cheng, Jinze Wu

[+] More

[-] Less

The key to video action detection lies in the understanding of interaction between persons and background objects in a video. Current methods usually employ object detectors to extract objects directly or use grid features to represent objects in the environment, which underestimate the great potential of multi-scale context information (e.g., objects and scenes of different sizes). How to exactly represent the multi-scale context and make full utilization of it still remains an unresolved challenge for spatial-temporal action localization. In this paper, we propose a novel Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network (AMCRNet) that extracts multi-scale context through multiple pooling layers with different sizes. Specifically, we develop an Interactive Relation Extraction module to model the higher-order relation between the target person and the context (e.g., other persons and objects). Along this line, we further propose a History Feature Bank and Interaction method to achieve better performance by modeling such relation across continuing video clips. Extensive experimental results on AVA2.2 and UCF101-24 demonstrate the superiority and rationality of our proposed AMCRNet.

List of keywords

Computer Vision -> CV: Action and behavior recognition
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Video analysis and understanding

2338

CSGCL: Community-Strength-Enhanced Graph Contrastive Learning

Han Chen, Ziwen Zhao, Yuhua Li, Yixiong Zou, Ruixuan Li, Rui Zhang

[+] More

[-] Less

Graph Contrastive Learning (GCL) is an effective way to learn generalized graph representations in a self-supervised manner, and has grown rapidly in recent years. However, the underlying community semantics has not been well explored by most previous GCL methods. Research that attempts to leverage communities in GCL regards them as having the same influence on the graph, leading to extra representation errors. To tackle this issue, we define ”community strength” to measure the difference of influence among communities. Under this premise, we propose a Community-Strength-enhanced Graph Contrastive Learning (CSGCL) framework to preserve community strength throughout the learning process. Firstly, we present two novel graph augmentation methods, Communal Attribute Voting (CAV) and Communal Edge Dropping (CED), where the perturbations of node attributes and edges are guided by community strength. Secondly, we propose a dynamic ”Team-up” contrastive learning scheme, where community strength is used to progressively fine-tune the contrastive objective. We report extensive experiment results on three downstream tasks: node classification, node clustering, and link prediction. CSGCL achieves state-of-the-art performance compared with other GCL methods, validating that community strength brings effectiveness and generality to graph representations. Our code is available at https://github.com/HanChen-HUST/CSGCL.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Self-supervised Learning

2339

Generalized Discriminative Deep Non-Negative Matrix Factorization Based on Latent Feature and Basis Learning

Zijian Yang, Zhiwei Li, Lu Sun

[+] More

[-] Less

As a powerful tool for data representation, deep NMF has attracted a lot of attention in recent years. Current deep NMF builds the multi-layer structure by decomposing either basis matrix or feature matrix into multiple factors, and probably complicate the learning process when data is insufficient or exhibits simple representation. To overcome the limitations, a novel algorithm called Generalized Deep Non-negative Matrix Factorization (GDNMF) is proposed, which generalizes several NMF and deep NMF methods in a unified framework. GDNMF simultaneously performs decomposition on both features and bases, which learns a hierarchical data representation based on multi-level basis. To further improve the latent representation and enhance its flexibility, GDNMF mutually reinforces the shallow linear model and the deep non-linear model. Moreover, to utilize the limited prior information, semi-supervised GDNMF is proposed by treating partial label information as soft constraints in the multi-layer structure. An efficient two-phase algorithm is developed to optimize the proposed problem. Experiment results on five real-world datesets verify the superior performance of GDNMF compared with state-of-the-art NMF-based methods.

List of keywords

Machine Learning -> ML: Clustering
Machine Learning -> ML: Weakly supervised learning

2358

Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition

Boqiang Zhang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Yongdong Zhang

[+] More

[-] Less

Vision model have gained increasing attention due to their simplicity and efficiency in Scene Text Recognition (STR) task. However, due to lacking the perception of linguistic knowledge and information, recent vision models suffer from two problems: (1) the pure vision-based query results in attention drift, which usually causes poor recognition and is summarized as linguistic insensitive drift (LID) problem in this paper. (2) the visual feature is suboptimal for the recognition in some vision-missing cases (e.g. occlusion, etc.). To address these issues, we propose a Linguistic Perception Vision model (LPV), which explores the linguistic capability of vision model for accurate text recognition. To alleviate the LID problem, we introduce a Cascade Position Attention (CPA) mechanism that obtains high-quality and accurate attention maps through step-wise optimization and linguistic information mining. Furthermore, a Global Linguistic Reconstruction Module (GLRM) is proposed to improve the representation of visual features by perceiving the linguistic information in the visual space, which gradually converts visual features into semantically rich ones during the cascade process. Different from previous methods, our method obtains SOTA results while keeping low complexity (92.4% accuracy with only 8.11M parameters). Code is available at https://github.com/CyrilSterling/LPV.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Vision and language

2362

G2Pxy: Generative Open-Set node Classification on Graphs with Proxy Unknowns

QIN ZHANG, Ze Lin Shi, Xiaolin Zhang, Xiaojun Chen, Philippe Fournier-Viger, Shirui Pan

[+] More

[-] Less

Node classification is the task of predicting the labels of unlabeled nodes in a graph. State-of-the-art methods based on graph neural networks achieve excellent performance when all labels are available during training. But in real-life, models are often applied on data with new classes, which can lead to massive misclassification and thus significantly degrade performance. Hence, developing open-set classification methods is crucial to determine if a given sample belongs to a known class. Existing methods for open-set node classification generally use transductive learning with part or all of the features of real unseen class nodes to help with open-set classification. In this paper, we propose a novel generative open-set node classification method, i.e., G2Pxy, which follows a stricter inductive learning setting where no information about unknown classes is available during training and validation. Two kinds of proxy unknown nodes, inter-class unknown proxies and external unknown proxies are generated via mixup to efficiently anticipate the distribution of novel classes. Using the generated proxies, a closed-set classifier can be transformed into an open-set one, by augmenting it with an extra proxy classifier. Under the constraint of both cross entropy loss and complement entropy loss, G2Pxy achieves superior effectiveness for unknown class detection and known class classification, which is validated by experiments on benchmark graph datasets. Moreover, G2P xy does not have specific requirement on the GNN architecture and shows good generalizations.

List of keywords

Machine Learning -> ML: Classification
Data Mining -> DM: Mining graphs

2366

Enabling Abductive Learning to Exploit Knowledge Graph

Yu-Xuan Huang, Zequn Sun, Guangyao Li, Xiaobin Tian, Wang-Zhou Dai, Wei Hu, Yuan Jiang, Zhi-Hua Zhou

[+] More

[-] Less

Most systems integrating data-driven machine learning with knowledge-driven reasoning usually rely on a specifically designed knowledge base to enable efficient symbolic inference. However, it could be cumbersome for the nonexpert end-users to prepare such a knowledge base in real tasks. Recent years have witnessed the success of large-scale knowledge graphs, which could be ideal domain knowledge resources for real-world machine learning tasks. However, these large-scale knowledge graphs usually contain much information that is irrelevant to a specific learning task. Moreover, they often contain a certain degree of noise. Existing methods can hardly make use of them because the large-scale probabilistic logical inference is usually intractable. To address these problems, we present ABductive Learning with Knowledge Graph (ABL-KG) that can automatically mine logic rules from knowledge graphs during learning, using a knowledge forgetting mechanism for filtering out irrelevant information. Meanwhile, these rules can form a logic program that enables efficient joint optimization of the machine learning model and logic inference within the Abductive Learning (ABL) framework. Experiments on four different tasks show that ABL-KG can automatically extract useful rules from large-scale and noisy knowledge graphs, and significantly improve the performance of machine learning with only a handful of labeled data.

List of keywords

Machine Learning -> ML: Knowledge-aided learning
Knowledge Representation and Reasoning -> KRR: Diagnosis and abductive reasoning
Machine Learning -> ML: Weakly supervised learning

2409

DenseDINO: Boosting Dense Self-Supervised Learning with Token-Based Point-Level Consistency

Yike Yuan, Xinghe Fu, Yunlong Yu, Xi Li

[+] More

[-] Less

In this paper, we propose a simple yet effective transformer framework for self-supervised learning called DenseDINO to learn dense visual representations. To exploit the spatial information that the dense prediction tasks require but neglected by the existing self-supervised transformers, we introduce point-level supervision across views in a novel token-based way. Specifically, DenseDINO introduces some extra input tokens called reference tokens to match the point-level features with the position prior. With the reference token, the model could maintain spatial consistency and deal with multi-object complex scene images, thus generalizing better on dense prediction tasks. Compared with the vanilla DINO, our approach obtains competitive performance when evaluated on classification in ImageNet and achieves a large margin (+7.2\% mIoU) improvement in semantic segmentation on PascalVOC under the linear evaluation protocol.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Representation learning

2446

Active Visual Exploration Based on Attention-Map Entropy

Adam Pardyl, Grzegorz Rypeść, Grzegorz Kurzejamski, Bartosz Zieliński, Tomasz Trzcinski

[+] More

[-] Less

Active visual exploration addresses the issue of limited sensor capabilities in real-world scenarios, where successive observations are actively chosen based on the environment. To tackle this problem, we introduce a new technique called Attention-Map Entropy (AME). It leverages the internal uncertainty of the transformer-based model to determine the most informative observations. In contrast to existing solutions, it does not require additional loss components, which simplifies the training. Through experiments, which also mimic retina-like sensors, we show that such simplified training significantly improves the performance of reconstruction and classification on publicly available datasets.

List of keywords

Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Attention models
Robotics -> ROB: Robotics and vision

2451

Deep Hierarchical Communication Graph in Multi-Agent Reinforcement Learning

Zeyang Liu, Lipeng Wan, Xue Sui, Zhuoran Chen, Kewu Sun, Xuguang Lan

[+] More

[-] Less

Sharing intentions is crucial for efficient cooperation in communication-enabled multi-agent reinforcement learning. Recent work applies static or undirected graphs to determine the order of interaction. However, the static graph is not general for complex cooperative tasks, and the parallel message-passing update in the undirected graph with cycles cannot guarantee convergence. To solve this problem, we propose Deep Hierarchical Communication Graph (DHCG) to learn the dependency relationships between agents based on their messages. The relationships are formulated as directed acyclic graphs (DAGs), where the selection of the proper topology is viewed as an action and trained in an end-to-end fashion. To eliminate the cycles in the graph, we apply an acyclicity constraint as intrinsic rewards and then project the graph in the admissible solution set of DAGs. As a result, DHCG removes redundant communication edges for cost improvement and guarantees convergence. To show the effectiveness of the learned graphs, we propose policy-based and value-based DHCG. Policy-based DHCG factorizes the joint policy in an auto-regressive manner, and value-based DHCG factorizes the joint value function to individual value functions and pairwise payoff functions. Empirical results show that our method improves performance across various cooperative multi-agent tasks, including Predator-Prey, Multi-Agent Coordination Challenge, and StarCraft Multi-Agent Challenge.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Agent-based and Multi-agent Systems -> MAS: Agent communication
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation

2459

Scalable Coupling of Deep Learning with Logical Reasoning

Marianne Defresne, Sophie Barbe, Thomas Schiex

[+] More

[-] Less

In the ongoing quest for hybridizing discrete reasoning with neural nets, there is an increasing interest in neural architectures that can learn how to solve discrete reasoning or optimization problems from natural inputs. In this paper, we introduce a scalable neural architecture and loss function dedicated to learning the constraints and criteria of NP-hard reasoning problems expressed as discrete Graphical Models. We empirically show our loss function is able to efficiently learn how to solve NP-hard reasoning problems from natural inputs as the symbolic, visual or many-solutions Sudoku problems as well as the energy optimization formulation of the protein design problem, providing data efficiency, interpretability, and a posteriori control over predictions.

List of keywords

Machine Learning -> ML: Neuro-symbolic methods
Constraint Satisfaction and Optimization -> CSO: Constraint learning and acquisition

2466

CONGREGATE: Contrastive Graph Clustering in Curvature Spaces

Li Sun, Feiyang Wang, Junda Ye, Hao Peng, Philip S. Yu

[+] More

[-] Less

Graph clustering is a longstanding research topic, and has achieved remarkable success with the deep learning methods in recent years. Nevertheless, we observe that several important issues largely remain open. On the one hand, graph clustering from the geometric perspective is appealing but has rarely been touched before, as it lacks a promising space for geometric clustering. On the other hand, contrastive learning boosts the deep graph clustering but usually struggles in either graph augmentation or hard sample mining. To bridge this gap, we rethink the problem of graph clustering from geometric perspective and, to the best of our knowledge, make the first attempt to introduce a heterogeneous curvature space to graph clustering problem. Correspondingly, we present a novel end-to-end contrastive graph clustering model named CONGREGATE, addressing geometric graph clustering with Ricci curvatures. To support geometric clustering, we construct a theoretically grounded Heterogeneous Curvature Space where deep representations are generated via the product of the proposed fully Riemannian graph convolutional nets. Thereafter, we train the graph clusters by an augmentation-free reweighted contrastive approach where we pay more attention to both hard negatives and hard positives in our curvature space. Empirical results on real-world graphs show that our model outperforms the state-of-the-art competitors.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Clustering

2470

LGI-GT: Graph Transformers with Local and Global Operators Interleaving

Shuo Yin, Guoqiang Zhong

[+] More

[-] Less

Since Transformers can alleviate some critical and fundamental problems of graph neural networks (GNNs), such as over-smoothing, over-squashing and limited expressiveness, they have been successfully applied to graph representation learning and achieved impressive results. However, although there are many works dedicated to make graph Transformers (GTs) aware of the structure and edge information by specifically tailored attention forms or graph-related positional and structural encodings, few works address the problem of how to construct high-performing GTs with modules of GNNs and Transformers. In this paper, we propose a novel graph Transformer with local and global operators interleaving (LGI-GT), in which we further design a new method propagating embeddings of the [CLS] token for global information representation. Additionally, we propose an effective message passing module called edge enhanced local attention (EELA), which makes LGI-GT a full-attention GT. Extensive experiments demonstrate that LGI-GT performs consistently better than previous state-of-the-art GNNs and GTs, while ablation studies show the effectiveness of the proposed LGI scheme and EELA. The source code of LGI-GT is available at https://github.com/lgi-gt/LGI-GT.

List of keywords

Machine Learning -> ML: Sequence and graph learning
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Representation learning

2471

Reverse Engineering of Temporal Queries Mediated by LTL Ontologies

Marie Fortin, Boris Konev, Vladislav Ryzhikov, Yury Savateev, Frank Wolter, Michael Zakharyaschev

[+] More

[-] Less

In reverse engineering of database queries, we aim to construct a query from a given set of answers and non-answers; it can then be used to explore the data further or as an explanation of the answers and non-answers. We investigate this query-by-example problem for queries formulated in positive fragments of linear temporal logic LTL over timestamped data, focusing on the design of suitable query languages and the combined and data complexity of deciding whether there exists a query in the given language that separates the given answers from non-answers. We consider both plain LTL queries and those mediated by LTL ontologies.

List of keywords

Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning

2479

U-Match: Two-view Correspondence Learning with Hierarchy-aware Local Context Aggregation

Zizhuo Li, Shihua Zhang, Jiayi Ma

[+] More

[-] Less

Local context capturing has become the core factor for achieving leading performance in two-view correspondence learning. Recent advances have devised various local context extractors whereas typically adopting explicit neighborhood relation modeling that is restricted and inflexible. To address this issue, we introduce U-Match, an attentional graph neural network that has the flexibility to enable implicit local context awareness at multiple levels. Specifically, a hierarchy-aware graph representation (HAGR) module is designed and fleshed out by local context pooling and unpooling operations. The former encodes local context by adaptively sampling a set of nodes to form a coarse-grained graph, while the latter decodes local context by recovering the coarsened graph back to its original size. Moreover, an orthogonal fusion module is proposed for the collaborative use of HAGR module, which integrates complementary local and global information into compact feature representations without redundancy. Extensive experiments on different visual tasks prove that our method significantly surpasses the state-of-the-arts. In particular, U-Match attains an AUC at 5 degree threshold of 60.53% on the challenging YFCC100M dataset without RANSAC, outperforming the strongest prior model by 8.61 absolute percentage points. Our code is publicly available at https://github.com/ZizhuoLi/U-Match.

List of keywords

Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Image and video retrieval

2490

An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding

Mingliang Zhai, yulin li, xiameng Qin, Chen Yi, Qunyi Xie, Chengquan Zhang, Kun Yao, Yuwei WU, Yunde Jia

[+] More

[-] Less

Transformers achieve promising performance in the structure document understanding because of its complex calculation, but remain inefficient in time complexity. Existing lightweight transformers fail to represent different granularity in documents. Therefore, it is difficult for them to achieve a good trade-off between efficiency and performance. In this paper, we present an hourglass architecture for high-performance low-computation document understanding. Specifically, we design a modality-guided dynamic token merging block, which not only makes the model learn multi-granularity representation, but also reduces the number of tokens in the middle layer. Considering that multi-modal interaction is critical for guiding merge, we develop a symmetry cross attention (SCA) to efficiently interact with multi-modal information. SCA allows one modality input as query to calculate cross attention with another modality. Extensive experiments on FUNSD, SROIE, and CORD datasets demonstrate that our model achieves state-of-the-art performance and 1.9x faster inference time than the state-of-the-art methods.

List of keywords

Natural Language Processing -> NLP: Information extraction
Machine Learning -> ML: Multi-modal learning

2491

CVTP3D: Cross-view Trajectory Prediction Using Shared 3D Queries for Autonomous Driving

Zijian Song, Huikun Bi, Ruisi Zhang, Tianlu Mao, Zhaoqi Wang

[+] More

[-] Less

Trajectory prediction with uncertainty is a critical and challenging task for autonomous driving. Nowadays, we can easily access sensor data represented in multiple views. But the existing models did not take cross-view consistency as the main condition, i.e., there are divergences between the multimodal predictions from different views. It is not practical and effective when the network does not comprehend the 3D scene, which could cause the downstream module in a dilemma. Our work modeled multimodal in a practical and reasonable way by maintaining cross-view consistency. We presented a cross-view trajectory prediction method using shared 3D Queries (CVTP3D). We employ a set of 3D queries shared across views to generate multi-goals that are cross-view consistent. We also proposed a random mask method and coarse-to-fine cross-attention to capture robust cross-view features. As far as we know, this is the first work that introduced the outstanding top-down paradigm in BEV detection field to a trajectory prediction problem. The results of experiments on two publicly available datasets showed that CVTP3D achieved state-of-the-art performance with consistent cross-view predictions.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Agent-based and Multi-agent Systems -> MAS: Human-agent interaction
Computer Vision -> CV: Machine learning for vision

2511

Towards Incremental NER Data Augmentation via Syntactic-aware Insertion Transformer

Wenjun Ke, Zongkai Tian, Qi Liu, Peng Wang, Jinhua Gao, Rui Qi

[+] More

[-] Less

Named entity recognition (NER) aims to locate and classify named entities in natural language texts. Most existing high-performance NER models employ a supervised paradigm, which requires a large quantity of high-quality annotated data during training. In order to help NER models perform well in few-shot scenarios, data augmentation approaches attempt to build extra data by means of random editing or by using end-to-end generation with PLMs. However, these methods focus on only the fluency of generated sentences, ignoring the syntactic correlation between the new and raw sentences. Such uncorrelation also brings low diversity and inconsistent labeling of synthetic samples. To fill this gap, we present SAINT (Syntactic-Aware InsertioN Transformer), a hard-constraint controlled text generation model that incorporates syntactic information. The proposed method operates by inserting new tokens between existing entities in a parallel manner. During insertion procedure, new tokens will be added taking both semantic and syntactic factors into account. Hence the resulting sentence can retain the syntactic correctness with respect to the raw data. Experimental results on two benchmark datasets, i.e., Ontonotes and Wikiann, demonstrate the comparable performance of SAINT over the state-of-the-art baselines.

List of keywords

Natural Language Processing -> NLP: Language generation
Natural Language Processing -> NLP: Named entities

2512

Engineering an Efficient Approximate DNF-Counter

Mate Soos, Divesh Aggarwal, Sourav Chakraborty, Kuldeep S Meel, Maciej Obremski

[+] More

[-] Less

Model counting is a fundamental problem with many practical applications, including query evaluation in probabilistic databases and failure-probability estimation of networks. In this work, we focus on a variant of this problem where the underlying formula is expressed in Disjunctive Normal Form (DNF), also known as \#DNF. This problem has been shown to be \#P-complete, making it intractable to solve exactly. Much research has therefore been focused on obtaining approximate solutions, particularly in the form of $(\epsilon, \delta)$ approximations. The primary contribution of this paper is a new approach, called dnfstream, to approximate \#DNF counting that achieves (nearly) optimal time complexity and outperforms existing FPRAS. Our approach is based on the recent breakthrough in the context of union of sets in streaming. We demonstrate the effectiveness of our approach through extensive experiments and show that it provides an affirmative answer to the challenge of efficiently computing \#DNF.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Solvers and tools

2529

Efficient Online Decision Tree Learning with Active Feature Acquisition

Arman Rahbar, Ziyu Ye, Yuxin Chen, Morteza Haghir Chehreghani

[+] More

[-] Less

Constructing decision trees online is a classical machine learning problem. Existing works often assume that features are readily available for each incoming data point. However, in many real world applications, both feature values and the labels are unknown a priori and can only be obtained at a cost. For example, in medical diagnosis, doctors have to choose which tests to perform (i.e., making costly feature queries) on a patient in order to make a diagnosis decision (i.e., predicting labels). We provide a fresh perspective to tackle this practical challenge. Our framework consists of an active planning oracle embedded in an online learning scheme for which we investigate several information acquisition functions. Specifically, we employ a surrogate information acquisition function based on adaptive submodularity to actively query feature values with a minimal cost, while using a posterior sampling scheme to maintain a low regret for online prediction. We demonstrate the efficiency and effectiveness of our framework via extensive experiments on various real-world datasets. Our framework also naturally adapts to the challenging setting of online learning with concept drift and is shown to be competitive with baseline models while being more flexible.

List of keywords

Machine Learning -> ML: Online learning
Data Mining -> DM: Mining data streams
Machine Learning -> ML: Active learning

2543

HOI-aware Adaptive Network for Weakly-supervised Action Segmentation

Runzhong Zhang, Suchen Wang, Yueqi Duan, Yansong Tang, Yue Zhang, Yap-Peng Tan

[+] More

[-] Less

In this paper, we propose an HOI-aware adaptive network named AdaAct for weakly-supervised action segmentation. Most existing methods learn a fixed network to predict the action of each frame with the neighboring frame features. However, this would result in ambiguity when estimating similar actions, such as pouring juice and pouring coffee. To address this, we aim to exploit temporally global but spatially local human-object interactions (HOI) as video-level prior knowledge for action segmentation. The long-term HOI sequence provides crucial contextual information to distinguish ambiguous actions, where our network dynamically adapts to the given HOI sequence at test time. More specifically, we first design a video HOI encoder that extracts, selects, and integrates the most representative HOI throughout the video. Then, we propose a two-branch HyperNetwork to learn an adaptive temporal encoder, which automatically adjusts the parameters based on the HOI information of various videos on the fly. Extensive experiments on two widely-used datasets including Breakfast and 50Salads demonstrate the effectiveness of our method under different evaluation metrics.

List of keywords

Computer Vision -> CV: Video analysis and understanding
Computer Vision -> CV: Action and behavior recognition

2554

GeNAS: Neural Architecture Search with Better Generalization

Joonhyun Jeong, Joonsang Yu, Geondo Park, Dongyoon Han, Youngjoon Yoo

[+] More

[-] Less

Neural Architecture Search (NAS) aims to automatically excavate the optimal network architecture with superior test performance. Recent neural architecture search (NAS) approaches rely on the validation loss or accuracy to find the superior network for the target data. In this paper, we investigate a new neural architecture search measure for excavating architectures with better generalization. We demonstrate that the flatness of the loss surface can be a promising proxy for predicting the generalization capability of neural network architectures. We evaluate our proposed method on various search spaces, showing similar or even better performance compared to the state-of-the-art NAS methods. Notably, the resultant architecture found by flatness measure generalizes robustly with regard to various data distribution shift (e.g. ImageNet-V2,-A,-O), as well as various tasks such as object detection and semantic segmentation.

List of keywords

Computer Vision -> CV: Machine learning for vision

2562

Explainable Text Classification via Attentive and Targeted Mixing Data Augmentation

Songhao Jiang, Yan Chu, Zhengkui Wang, Tianxing Ma, 王瀚麟, wenxuan lu, Tianning Zang, Wang Bo

[+] More

[-] Less

Mixing data augmentation methods have been widely used in text classification recently. However, existing methods do not control the quality of augmented data and have low model explainability. To tackle these issues, this paper proposes an explainable text classification solution based on attentive and targeted mixing data augmentation, ATMIX. Instead of selecting data for augmentation without control, ATMIX focuses on the misclassified training samples as the target for augmentation to better improve the model’s capability. Meanwhile, to generate meaningful augmented samples, it adopts a self-attention mechanism to understand the importance of the subsentences in a text, and cut and mix the subsentences between the misclassified and correctly classified samples wisely. Furthermore, it employs a novel dynamic augmented data selection framework based on the loss function gradient to dynamically optimize the augmented samples for model training. In the end, we develop a new model explainability evaluation method based on subsentence attention and conduct extensive evaluations over multiple real-world text datasets. The results indicate that ATMIX is more effective with higher explainability than the typical classification models, hidden-level, and input-level mixup models.

List of keywords

Natural Language Processing -> NLP: Text classification
Natural Language Processing -> NLP: Sentiment analysis, stylistic analysis, and argument mining
Natural Language Processing -> NLP: Tools

2576

Synthesizing Resilient Strategies for Infinite-Horizon Objectives in Multi-Agent Systems

David Klaska, Antonin Kucera, Martin Kurečka, Vit Musil, Petr Novotný, Vojtech Rehak

[+] More

[-] Less

We consider the problem of synthesizing resilient and stochastically stable strategies for systems of cooperating agents striving to minimize the expected time between consecutive visits to selected locations in a known environment. A strategy profile is resilient if it retains its functionality even if some of the agents fail, and stochastically stable if the visiting time variance is small. We design a novel specification language for objectives involving resilience and stochastic stability, and we show how to efficiently compute strategy profiles (for both autonomous and coordinated agents) optimizing these objectives. Our experiments show that our strategy synthesis algorithm can construct highly non-trivial and efficient strategy profiles for environments with general topology.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Planning and Scheduling -> PS: Robot planning

2577

An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations

Achille Fokoue, Ibrahim Abdelaziz, Maxwell Crouse, Shajith Ikbal, Akihiro Kishimoto, Guilherme Lima, Ndivhuwo Makondo, Radu Marinescu

[+] More

[-] Less

Using reinforcement learning for automated theorem proving has recently received much attention. Current approaches use representations of logical statements that often rely on the names used in these statements and, as a result, the models are generally not transferable from one domain to another. The size of these representations and whether to include the whole theory or part of it are other important decisions that affect the performance of these approaches as well as their runtime efficiency. In this paper, we present NIAGRA; an ensemble Name InvAriant Graph RepresentAtion. NIAGRA addresses this problem by using 1) improved Graph Neural Networks for learning name-invariant formula representations that is tailored for their unique characteristics and 2) an efficient ensemble approach for automated theorem proving. Our experimental evaluation shows state-of-the-art performance on multiple datasets from different domains with improvements up to 10% compared to the best learning-based approaches. Furthermore, transfer learning experiments show that our approach significantly outperforms other learning-based approaches by up to 28%.

List of keywords

Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving
Machine Learning -> ML: Applications
Machine Learning -> ML: Representation learning

2587

Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening

Mingsong Li, Yikun Liu, Tao Xiao, Yuwen Huang, Gongping Yang

[+] More

[-] Less

Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image. Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency. For one thing, the universally adopted black box principle limits the model interpretability. For another thing, existing DL-based methods fail to efficiently capture local and global dependencies at the same time, inevitably limiting the overall performance. To address these mentioned issues, we first formulate the degradation process of the high-resolution multispectral (HrMS) image as a unified variational optimization problem, and alternately solve its data and prior subproblems by the designed iterative proximal gradient descent (PGD) algorithm. Moreover, we customize a Local-Global Transformer (LGT) to simultaneously model local and global dependencies, and further formulate an LGT-based prior module for image denoising. Besides the prior module, we also design a lightweight data module. Finally, by serially integrating the data and prior modules in each iterative stage, we unfold the iterative algorithm into a stage-wise unfolding network, Local-Global Transformer Enhanced Unfolding Network (LGTEUN), for the interpretable MS pan-sharpening. Comprehensive experimental results on three satellite data sets demonstrate the effectiveness and efficiency of LGTEUN compared with state-of-the-art (SOTA) methods. The source code is available at https://github.com/lms-07/LGTEUN.

List of keywords

Computer Vision -> CV: Applications
Machine Learning -> ML: Applications

2590

DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning

Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li

[+] More

[-] Less

Communication lays the foundation for cooperation in human society and in multi-agent reinforcement learning (MARL). Humans also desire to maintain their privacy when communicating with others, yet such privacy concern has not been considered in existing works in MARL. We propose the \textit{differentially private multi-agent communication} (DPMAC) algorithm, which protects the sensitive information of individual agents by equipping each agent with a local message sender with rigorous $(\epsilon, \delta)$-differential privacy (DP) guarantee. In contrast to directly perturbing the messages with predefined DP noise as commonly done in privacy-preserving scenarios, we adopt a stochastic message sender for each agent respectively and incorporate the DP requirement into the sender, which automatically adjusts the learned message distribution to alleviate the instability caused by DP noise. Further, we prove the existence of a Nash equilibrium in cooperative MARL with privacy-preserving communication, which suggests that this problem is game-theoretically learnable. Extensive experiments demonstrate a clear advantage of DPMAC over baseline methods in privacy-preserving scenarios.

List of keywords

Machine Learning -> ML: Deep reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Agent communication

2607

A Generalized Deep Markov Random Fields Framework for Fake News Detection

Yiqi Dong, Di Jin, Xiaobao Wang, Yawen Li, Xiaowen Su, Dongxiao He

[+] More

[-] Less

Recently, the wanton dissemination of fake news on social media has adversely affected our lives, rendering automatic fake news detection a pressing issue. Current methods are often fully supervised and typically employ deep neural networks (DNN) to learn implicit relevance from labeled data, ignoring explicitly shared properties (e.g., inflammatory expressions) across fake news. To address this limitation, we propose a graph-theoretic framework, called Generalized Deep Markov Random Fields Framework (GDMRFF), that inherits the capability of deep learning while at the same time exploiting the correlations among the news articles (including labeled and unlabeled data). Specifically, we first leverage a DNN-based module to learn implicit relations, which we then reveal as the unary function of MRF. Pairwise functions with refining effects to encapsulate human insights are designed to capture the explicit association among all samples. Meanwhile, an event removal module is introduced to remove event impact on pairwise functions. Note that we train GDMRFF with the semi-supervised setting, which decreases the reliance on labeled data while maximizing the potential of unlabeled data. We further develop an Ambiguity Learning Guided MRF (ALGM) model as a concretization of GDMRFF. Experiments show that ALGM outperforms the compared methods significantly on two datasets, especially when labeled data is limited.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Web and social networks
Data Mining -> DM: Mining graphs
Natural Language Processing -> NLP: Text classification

2611

RuleMatch: Matching Abstract Rules for Semi-supervised Learning of Human Standard Intelligence Tests

Yunlong Xu, Lingxiao Yang, Hongzhi You, Zonglei Zhen, Da-Hui Wang, Xiaohong Wan, Xiaohua Xie, Ru-Yuan Zhang

[+] More

[-] Less

Raven’s Progressive Matrices (RPM), one of the standard intelligence tests in human psychology, has recently emerged as a powerful tool for studying abstract visual reasoning (AVR) abilities in machines. Although existing computational models for RPM problems achieve good performance, they require a large number of labeled training examples for supervised learning. In contrast, humans can efficiently solve unlabeled RPM problems after learning from only a few example questions. Here, we develop a semi-supervised learning (SSL) method, called RuleMatch, to train deep models with a small number of labeled RPM questions along with other unlabeled questions. Moreover, instead of using pixel-level augmentation in object perception tasks, we exploit the nature of RPM problems and augment the data at the level of abstract rules. Specifically, we disrupt the possible rules contained among context images in an RPM question and force the two augmented variants of the same unlabeled sample to obey the same abstract rule and predict a common pseudo label for training. Extensive experiments show that the proposed RuleMatch achieves state-of-the-art performance on two popular RAVEN datasets. Our work makes an important stride in aligning abstract analogical visual reasoning abilities in machines and humans. Our Code is at https://github.com/ZjjConan/AVR-RuleMatch.

List of keywords

Computer Vision -> CV: Visual reasoning and symbolic representation
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

2612

DFVSR: Directional Frequency Video Super-Resolution via Asymmetric and Enhancement Alignment Network

Shuting Dong, Feng Lu, Zhe Wu, Chun Yuan

[+] More

[-] Less

Frequency-based methods have recently received much attention due to their impressive restoration of detail and structure in video super-resolution. However, most of these frequency-based methods mainly have three major limitations: 1) insufficient exploration of object motion information, 2) inadequate enhancement for high-fidelity regions, and 3) loss of spatial information during convolution. In this paper, we proposed a novel network, Directional Frequency Video Super-Resolution (DFVSR), to address these limitations. Specifically, we reconsider object motion from a new perspective and propose Directional Frequency Representation (DFR), which not only borrows the property of frequency representation of detail and structure information but also contains the direction information of the object motion that is extremely significant in videos. Based on this representation, we proposed a Directional Frequency-Enhanced Alignment (DFEA) to use double enhancements of task-related information for ensuring the retention of high-fidelity frequency regions to generate the high-quality alignment feature. Furthermore, we design a novel Asymmetrical U-shaped network architecture to progressively fuse these alignment features and output the final output. This architecture enables the intercommunication of the same level of resolution in the encoder and decoder to achieve the supplement of spatial information. Powered by the above designs, our method achieves superior performance over state-of-the-art models on both quantitative and qualitative evaluations.

List of keywords

Computer Vision -> CV: Image and video retrieval
Computer Vision -> CV: Other

2614

SS-BSN: Attentive Blind-Spot Network for Self-Supervised Denoising with Nonlocal Self-Similarity

Young-Joo Han, Ha-Jin Yu

[+] More

[-] Less

Recently, numerous studies have been conducted on supervised learning-based image denoising methods. However, these methods rely on large-scale noisy-clean image pairs, which are difficult to obtain in practice. Denoising methods with self-supervised training that can be trained with only noisy images have been proposed to address the limitation. These methods are based on the convolutional neural network (CNN) and have shown promising performance. However, CNN-based methods do not consider using nonlocal self-similarities essential in the traditional method, which can cause performance limitations. This paper presents self-similarity attention (SS-Attention), a novel self-attention module that can capture nonlocal self-similarities to solve the problem. We focus on designing a lightweight self-attention module in a pixel-wise manner, which is nearly impossible to implement using the classic self-attention module due to the quadratically increasing complexity with spatial resolution. Furthermore, we integrate SS-Attention into the blind-spot network called self-similarity-based blind-spot network (SS-BSN). We conduct the experiments on real-world image denoising tasks. The proposed method quantitatively and qualitatively outperforms state-of-the-art methods in self-supervised denoising on the Smartphone Image Denoising Dataset (SIDD) and Darmstadt Noise Dataset (DND) benchmark datasets.

List of keywords

Computer Vision -> CV: Computational photography
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Self-supervised Learning

2619

StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset

Chaofan Huo, Ye Shi, Yuexin Ma, Lan Xu, Jingyi Yu, Jingya Wang

[+] More

[-] Less

Modeling and capturing the 3D spatial arrangement of the human and the object is the key to perceiving 3D human-object interaction from monocular images. In this work, we propose to use Human-Object Offset between anchors which are densely sampled from the surface of human mesh and object mesh to represent human-object spatial relation. Compared with previous works which use contact map or implicit distance filed to encode 3D human-object spatial relations, our method is a simple and efficient way to encode the highly detailed correlation between the human and object. Based on this representation, we propose Stacked Normalizing Flow (StackFLOW) to infer the posteriori distribution of human-object spatial relations from the image. During the optimization stage, we finetune the human body pose and object 6D pose by maximizing the likelihood of samples based on this posteriori distribution and minimizing the 2D-3D corresponding reprojection loss. Extensive experimental results show that our method significantly outperforms the SOTA with two challenging benchmarks, BEHAVE and InterCap datasets.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Action and behavior recognition
Computer Vision -> CV: Biometrics, face, gesture and pose recognition

2638

Prediction with Incomplete Data under Agnostic Mask Distribution Shift

Yichen Zhu, Jian Yuan, Bo Jiang, Tao Lin, Haiming Jin, Xinbing Wang, Chenghu Zhou

[+] More

[-] Less

Data with missing values is ubiquitous in many applications. Recent years have witnessed increasing attention on prediction with only incomplete data consisting of observed features and a mask that indicates the missing pattern. Existing methods assume that the training and testing distributions are the same, which may be violated in real-world scenarios. In this paper, we consider prediction with incomplete data in the presence of distribution shift. We focus on the case where the underlying joint distribution of complete features and label is invariant, but the missing pattern, i.e., mask distribution may shift agnostically between training and testing. To achieve generalization, we leverage the observation that for each mask, there is an invariant optimal predictor. To avoid the exponential explosion when learning them separately, we approximate the optimal predictors jointly using a double parameterization technique. This has the undesirable side effect of allowing the learned predictors to rely on the intra-mask correlation and that between features and mask. We perform decorrelation to minimize this effect. Combining the techniques above, we propose a novel prediction method called StableMiss. Extensive experiments on both synthetic and real-world datasets show that StableMiss is robust and outperforms state-of-the-art methods under agnostic mask distribution shift.

List of keywords

Machine Learning -> ML: Multi-task and transfer learning

2653

Decentralized Anomaly Detection in Cooperative Multi-Agent Reinforcement Learning

Kiarash Kazari, Ezzeldin Shereen, Gyorgy Dan

[+] More

[-] Less

We consider the problem of detecting adversarial attacks against cooperative multi-agent reinforcement learning. We propose a decentralized scheme that allows agents to detect the abnormal behavior of one compromised agent. Our approach is based on a recurrent neural network (RNN) trained during cooperative learning to predict the action distribution of other agents based on local observations. The predicted distribution is used for computing a normality score for the agents, which allows the detection of the misbehavior of other agents. To explore the robustness of the proposed detection scheme, we formulate the worst-case attack against our scheme as a constrained reinforcement learning problem. We propose to compute an attack policy by optimizing the corresponding dual function using reinforcement learning. Extensive simulations on various multi-agent benchmarks show the effectiveness of the proposed detection scheme in detecting state-of-the-art attacks and in limiting the impact of undetectable attacks.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI

2666

Deliberation and Voting in Approval-Based Multi-Winner Elections

Kanav Mehra, Nanda Kishore Sreenivas, Kate Larson

[+] More

[-] Less

Citizen-focused democratic processes where participants deliberate on alternatives and then vote to make the final decision are increasingly popular today. While the computational social choice literature has extensively investigated voting rules, there is limited work that explicitly looks at the interplay of the deliberative process and voting. In this paper, we build a deliberation model using established models from the opinion-dynamics literature and study the effect of different deliberation mechanisms on voting outcomes achieved when using well-studied voting rules. Our results show that deliberation generally improves welfare and representation guarantees, but the results are sensitive to how the deliberation process is organized. We also show, experimentally, that simple voting rules, such as approval voting, perform as well as more sophisticated rules such as proportional approval voting or method of equal shares if deliberation is properly supported. This has ramifications on the practical use of such voting rules in citizen-focused democratic processes.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence

2670

FedPass: Privacy-Preserving Vertical Federated Deep Learning with Adaptive Obfuscation

Hanlin Gu, Jiahuan Luo, Yan Kang, Lixin Fan, Qiang Yang

[+] More

[-] Less

Vertical federated learning (VFL) allows an active party with labeled data to leverage auxiliary features from the passive parties to improve model performance. Concerns about the private feature and label leakage in both the training and inference phases of VFL have drawn wide research attention. In this paper, we propose a general privacy-preserving vertical federated deep learning framework called FedPass, which leverages adaptive obfuscation to protect the feature and label simultaneously. Strong privacy-preserving capabilities about private features and labels are theoretically proved (in Theorems 1 and 2). Extensive experimental results with different datasets and network architectures also justify the superiority of FedPass against existing methods in light of its near-optimal trade-off between privacy and model performance.

List of keywords

Machine Learning -> ML: Federated learning
Computer Vision -> CV: Bias, fairness and privacy

2671

Contrastive Label Enhancement

Yifei Wang, Yiyang Zhou, Jihua Zhu, Xinyuan Liu, Wenbiao Yan, Zhiqiang Tian

[+] More

[-] Less

Label distribution learning (LDL) is a new machine learning paradigm for solving label ambiguity. Since it is difficult to directly obtain label distributions, many studies are focusing on how to recover label distributions from logical labels, dubbed label enhancement (LE). Existing LE methods estimate label distributions by simply building a mapping relationship between features and label distributions under the supervision of logical labels. They typically overlook the fact that both features and logical labels are descriptions of the instance from different views. Therefore, we propose a novel method called Contrastive Label Enhancement (ConLE) which integrates features and logical labels into the unified projection space to generate high-level features by contrastive learning strategy. In this approach, features and logical labels belonging to the same sample are pulled closer, while those of different samples are projected farther away from each other in the projection space. Subsequently, we leverage the obtained high-level features to gain label distributions through a well-designed training strategy that considers the consistency of label attributes. Extensive experiments on LDL benchmark datasets demonstrate the effectiveness and superiority of our method.

List of keywords

Machine Learning -> ML: Multi-label
Machine Learning -> ML: Multi-view learning

2676

SAT-Based PAC Learning of Description Logic Concepts

Balder Ten Cate, Maurice Funk, Jean Jung, Carsten Lutz

[+] More

[-] Less

We propose bounded fitting as a scheme for learning description logic concepts in the presence of ontologies. A main advantage is that the resulting learning algorithms come with theoretical guarantees regarding their generalization to unseen examples in the sense of PAC learning. We prove that, in contrast, several other natural learning algorithms fail to provide such guarantees. As a further contribution, we present the system SPELL which efficiently implements bounded fitting for the description logic ELHr based on a SAT solver, and compare its performance to a state-of-the-art learner.

List of keywords

Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Learning and reasoning

2692

A Novel Demand Response Model and Method for Peak Reduction in Smart Grids — PowerTAC

Sanjay Chandlekar, Shweta Jain, Sujit Gujar

[+] More

[-] Less

One of the widely used peak reduction methods in smart grids is demand response, where one analyzes the shift in customers’ (agents’) usage patterns in response to the signal from the distribution company. Often, these signals are in the form of incentives offered to agents. This work studies the effect of incentives on the probabilities of accepting such offers in a real-world smart grid simulator, PowerTAC. We first show that there exists a function that depicts the probability of an agent reducing its load as a function of the discounts offered to them. We call it reduction probability (RP). RP function is further parametrized by the rate of reduction (RR), which can differ for each agent. We provide an optimal algorithm, MJS–ExpResponse, that outputs the discounts to each agent by maximizing the expected reduction under a budget constraint. When RRs are unknown, we propose a Multi-Armed Bandit (MAB) based online algorithm, namely MJSUCB–ExpResponse, to learn RRs. Experimentally we show that it exhibits sublinear regret. Finally, we showcase the efficacy of the proposed algorithm in mitigating demand peaks in a real-world smart grid system using the PowerTAC simulator as a test bed.

List of keywords

Machine Learning -> ML: Applications
Multidisciplinary Topics and Applications -> MDA: Energy, environment and sustainability

2693

MolHF: A Hierarchical Normalizing Flow for Molecular Graph Generation

Yiheng Zhu, Zhenqiu Ouyang, Ben Liao, Jialu Wu, Yixuan Wu, Chang-Yu Hsieh, Tingjun Hou, Jian Wu

[+] More

[-] Less

Molecular de novo design is a critical yet challenging task in scientific fields, aiming to design novel molecular structures with desired property profiles. Significant progress has been made by resorting to generative models for graphs. However, limited attention is paid to hierarchical generative models, which can exploit the inherent hierarchical structure (with rich semantic information) of the molecular graphs and generate complex molecules of larger size that we shall demonstrate to be difficult for most existing models. The primary challenge to hierarchical generation is the non-differentiable issue caused by the generation of intermediate discrete coarsened graph structures. To sidestep this issue, we cast the tricky hierarchical generation problem over discrete spaces as the reverse process of hierarchical representation learning and propose MolHF, a new hierarchical flow-based model that generates molecular graphs in a coarse-to-fine manner. Specifically, MolHF first generates bonds through a multi-scale architecture, then generates atoms based on the coarsened graph structure at each scale. We demonstrate that MolHF achieves state-of-the-art performance in random generation and property optimization, implying its high capacity to model data distribution. Furthermore, MolHF is the first flow-based model that can be applied to model larger molecules (polymer) with more than 100 heavy atoms. The code and models are available at https://github.com/violet-sto/MolHF.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Health and medicine
Machine Learning -> ML: Probabilistic machine learning
Machine Learning -> ML: Sequence and graph learning

2705

An Empirical Study on the Language Modal in Visual Question Answering

Daowan Peng, Wei Wei, Xian-Ling Mao, Yuanyuan Fu, Dangyang Chen

[+] More

[-] Less

Generalization beyond experience-based on out-of-distribution data is of great significance in AI domain. Of late, the state-of-the-art Visual Question Answering (VQA) models could perform well on in-domain data partly due to the language prior, which, however, would limit their generalizability in real-world situations. It is widely known that the bias-dependency issue is the culprit of such dilemmas. This paper analyzes the problem of language bias from a new perspective and aims to obtain more information about the issue through empirical analysis. We found, specifically, that 1) postfix that were more essential than question-type in causing bias issue; 2) word-sequence-related disturbance of the question would lead to VQA model’s inability to learn fixed pattern in the original question, hence reducing bias reliance learning. The experimental results show that almost all test models improved their performance when trained with disturbed question on VQA-CPv2, the LXMERT even achieves a 10-point gain without adopting any de-bias methods. Additionally, we propose a data enhancement method, and conduct extensive experiments. The experimental results show that the proposed method can improve the performance on out-of-distribution benchmark. We hope this study inspires novel insights for future research on designing bias-reduction approaches.

List of keywords

Machine Learning -> ML: Multi-modal learning
Natural Language Processing -> NLP: Question answering

2716

LSGNN: Towards General Graph Neural Network in Node Classification by Local Similarity

Yuhan Chen, Yihong Luo, Jing Tang, Liang Yang, Siya Qiu, Chuan Wang, Xiaochun Cao

[+] More

[-] Less

Heterophily has been considered as an issue that hurts the performance of Graph Neural Networks (GNNs). To address this issue, some existing work uses a graph-level weighted fusion of the information of multi-hop neighbors to include more nodes with homophily. However, the heterophily might differ among nodes, which requires to consider the local topology. Motivated by it, we propose to use the local similarity (LocalSim) to learn node-level weighted fusion, which can also serve as a plug-and-play module. For better fusion, we propose a novel and efficient Initial Residual Difference Connection (IRDC) to extract more informative multi-hop information. Moreover, we provide theoretical analysis on the effectiveness of LocalSim representing node homophily on synthetic graphs. Extensive evaluations over real benchmark datasets show that our proposed method, namely Local Similarity Graph Neural Network (LSGNN), can offer comparable or superior state-of-the-art performance on both homophilic and heterophilic graphs. Meanwhile, the plug-and-play model can significantly boost the performance of existing GNNs.

List of keywords

Machine Learning -> ML: Sequence and graph learning
Data Mining -> DM: Mining graphs

2727

Hierarchical Semantic Contrast for Weakly Supervised Semantic Segmentation

Yuanchen Wu, Xiaoqiang Li, Songmin Dai, Jide Li, Tong Liu, Shaorong Xie

[+] More

[-] Less

Weakly supervised semantic segmentation (WSSS) with image-level annotations has achieved great processes through class activation map (CAM). Since vanilla CAMs are hardly served as guidance to bridge the gap between full and weak supervision, recent studies explore semantic representations to make CAM fit for WSSS and demonstrate encouraging results. However, they generally exploit single-level semantics, which may hamper the model to learn a comprehensive semantic structure. Motivated by the prior that each image has multiple levels of semantics, we propose hierarchical semantic contrast (HSC) to ameliorate the above problem. It conducts semantic contrast from coarse-grained to fine-grained perspective, including ROI level, class level, and pixel level, making the model learn a better object pattern understanding. To further improve CAM quality, building upon HSC, we explore consistency regularization of cross supervision and develop momentum prototype learning to utilize abundant semantics across different images. Extensive studies manifest that our plug-and-play learning paradigm, HSC, can significantly boost CAM quality on both non-saliency-guided and saliency-guided baselines, and establish new state-of-the-art WSSS performance on PASCAL VOC 2012 dataset. Code is available at https://github.com/Wu0409/HSC_WSSS.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Representation learning
Computer Vision -> CV: Scene analysis and understanding

2734

Open Anomalous Trajectory Recognition via Probabilistic Metric Learning

Qiang Gao, Xiaohan Wang, Chaoran Liu, Goce Trajcevski, Li Huang, Fan Zhou

[+] More

[-] Less

Identifying anomalous trajectories appropriate responses to a variety of unusual traffic behaviors or driving patterns. Due to the limit of closed-set context, existing approaches fail to recognize the unknown anomalous trajectories, resulting in an insufficient self-motivated learning paradigm. We investigate the novel Anomalous Trajectory Recognition problem in an Open-world scenario (ATRO) and introduce a novel probabilistic Metric learning model, namely ATROM, to address it. Specifically, ATROM can detect the presence of unknown anomalous behavior in addition to identifying known behavior. It has a Mutual Interaction Distillation that uses contrastive metric learning to explore the interactive semantics regarding the diverse behavioral intents and a Probabilistic Trajectory Embedding that forces the trajectories with distinct behaviors to follow different Gaussian priors. More importantly, ATROM offers a probabilistic metric rule to discriminate between known and unknown behavioral patterns by taking advantage of the approximation of multiple priors. Experimental results on two large-scale trajectory datasets demonstrate the superiority of ATROM in addressing both known and unknown anomalous patterns.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Data Mining -> DM: Applications
Multidisciplinary Topics and Applications -> MDA: Transportation

2738

Boosting Decision-Based Black-Box Adversarial Attack with Gradient Priors

Han Liu, Xingshuo Huang, Xiaotong Zhang, Qimai Li, Fenglong Ma, Wei Wang, Hongyang Chen, Hong Yu, Xianchao Zhang

[+] More

[-] Less

Decision-based methods have shown to be effective in black-box adversarial attacks, as they can obtain satisfactory performance and only require to access the final model prediction. Gradient estimation is a critical step in black-box adversarial attacks, as it will directly affect the query efficiency. Recent works have attempted to utilize gradient priors to facilitate score-based methods to obtain better results. However, these gradient priors still suffer from the edge gradient discrepancy issue and the successive iteration gradient direction issue, thus are difficult to simply extend to decision-based methods. In this paper, we propose a novel Decision-based Black-box Attack framework with Gradient Priors (DBA-GP), which seamlessly integrates the data-dependent gradient prior and time-dependent prior into the gradient estimation procedure. First, by leveraging the joint bilateral filter to deal with each random perturbation, DBA-GP can guarantee that the generated perturbations in edge locations are hardly smoothed, i.e., alleviating the edge gradient discrepancy, thus remaining the characteristics of the original image as much as possible. Second, by utilizing a new gradient updating strategy to automatically adjust the successive iteration gradient direction, DBA-GP can accelerate the convergence speed, thus improving the query efficiency. Extensive experiments have demonstrated that the proposed method outperforms other strong baselines significantly.

List of keywords

Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods

2748

VS-Boost: Boosting Visual-Semantic Association for Generalized Zero-Shot Learning

Xiaofan Li, Yachao Zhang, Shiran Bian, Yanyun Qu, Yuan Xie, Jianping Fan, zhongchao shi

[+] More

[-] Less

Unlike conventional zero-shot learning (CZSL) which only focuses on the recognition of unseen classes by using the classifier trained on seen classes and semantic embeddings, generalized zero-shot learning (GZSL) aims at recognizing both the seen and unseen classes, so it is more challenging due to the extreme training imbalance. Recently, some feature generation methods introduce metric learning to enhance the discriminability of visual features. Although these methods achieve good results, they focus only on metric learning in the visual feature space to enhance features and ignore the association between the feature space and the semantic space. Since the GZSL method uses semantics as prior knowledge to migrate visual knowledge to unseen classes, the consistency between visual space and semantic space is critical. To this end, we propose relational metric learning which can relate the metrics in the two spaces and make the distribution of the two spaces more consistent. Based on the generation method and relational metric learning, we proposed a novel GZSL method, termed VS-Boost, which can effectively boost the association between vision and semantics. The experimental results demonstrate that our method is effective and achieves significant gains on five benchmark datasets compared with the state-of-the-art methods.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Neural generative models, auto encoders, GANs
Computer Vision -> CV: Recognition (object detection, categorization)

2753

Low-Confidence Samples Mining for Semi-supervised Object Detection

Guandu Liu, Fangyuan Zhang, Tianxiang Pan, Jun-Hai Yong, Bin Wang

[+] More

[-] Less

Reliable pseudo labels from unlabeled data play a key role in semi-supervised object detection (SSOD). However, the state-of-the-art SSOD methods all rely on pseudo labels with high confidence, which ignore valuable pseudo labels with lower confidence. Additionally, the insufficient excavation for unlabeled data results in an excessively low recall rate thus hurting the network training. In this paper, we propose a novel Low-confidence Samples Mining (LSM) method to utilize low confidence pseudo labels efficiently. Specifically, we develop an additional pseudo information mining (PIM) branch on account of low-resolution feature maps to extract reliable large area instances, the IoUs of which are higher than small area ones. Owing to the complementary predictions between PIM and the main branch, we further design self-distillation (SD) to compensate for both in a mutually learning manner. Meanwhile, the extensibility of the above approaches enables our LSM to apply to Faster-RCNN and Deformable-DETR respectively. On the MS-COCO benchmark, our method achieves 3.54% mAP improvement over state-of-the-art methods under 5% labeling ratios.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Data Mining -> DM: Applications
Data Mining -> DM: Exploratory data mining

2754

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

Youhe Jiang, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, Bin Cui

[+] More

[-] Less

Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the distributed training of ultra-large models. However, directly deploying these systems often leads to sub-optimal training efficiency due to the complex model architectures and the strict device memory constraints. In this paper, we propose Optimal Sharded Data Parallel (OSDP), an automated parallel training system that combines the advantages from both data and model parallelism. Given the model description and the device information, OSDP makes trade-offs between the memory consumption and the hardware utilization, thus automatically generates the distributed computation graph and maximizes the overall system throughput. In addition, OSDP introduces operator splitting to further alleviate peak memory footprints during training with negligible overheads, which enables the trainability of larger models as well as the higher throughput. Extensive experimental results of OSDP on multiple different kinds of large-scale models demonstrate that the proposed strategy outperforms the state-of-the-art in multiple regards.

List of keywords

Data Mining -> DM: Parallel, distributed and cloud-based high performance mining
Data Mining -> DM: Big data and scalability

2757

Bayesian Optimization with Switching Cost: Regret Analysis and Lookahead Variants

Peng Liu, Haowei Wang, Wei Qiyu

[+] More

[-] Less

Recently, Bayesian Optimization (BO) has received increasing attention due to its efficiency in optimizing expensive-to-evaluate functions. For some practical problems, it is essential to consider the switching cost between consecutive sampling locations given a total traveling budget. For example, when using a drone to locate cracks in a building wall or search for lost survivors in the wild, the search path needs to be efficiently planned given the limited battery power of the drone. Tackling such problems requires a careful cost-benefit analysis of candidate locations and keeping a balance between exploration and exploitation. In this work, we formulate such a problem as a constrained Markov Decision Process (MDP) and solve it by proposing a new distance-adjusted multi-step look-ahead acquisition function, the distUCB, and using rollout approximation. We also provide a theoretical regret analysis of the distUCB-based Bayesian optimization algorithm. In addition, the empirical performance of the proposed algorithm is tested based on both synthetic and real data experiments, and it shows that our cost-aware non-myopic algorithm performs better than other popular alternatives.

List of keywords

Machine Learning -> ML: Bayesian learning
Machine Learning -> ML: Hyperparameter optimization

2758

Learning Survival Distribution with Implicit Survival Function

Yu Ling, Weimin Tan, Bo Yan

[+] More

[-] Less

Survival analysis aims at modeling the relationship between covariates and event occurrence with some untracked (censored) samples. In implementation, existing methods model the survival distribution with strong assumptions or in a discrete time space for likelihood estimation with censorship, which leads to weak generalization. In this paper, we propose Implicit Survival Function (ISF) based on Implicit Neural Representation for survival distribution estimation without strong assumptions, and employ numerical integration to approximate the cumulative distribution function for prediction and optimization. Experimental results show that ISF outperforms the state-of-the-art methods in three public datasets and has robustness to the hyperparameter controlling estimation precision.

List of keywords

Machine Learning -> ML: Other

2759

Deep Symbolic Learning: Discovering Symbols and Rules from Perceptions

Alessandro Daniele, Tommaso Campari, Sagar Malhotra, Luciano Serafini

[+] More

[-] Less

Neuro-Symbolic (NeSy) integration combines symbolic reasoning with Neural Networks (NNs) for tasks requiring perception and reasoning. Most NeSy systems rely on continuous relaxation of logical knowledge, and no discrete decisions are made within the model pipeline. Furthermore, these methods assume that the symbolic rules are given. In this paper, we propose Deep Symboilic Learning (DSL), a NeSy system that learns \emph{NeSy-functions}, i.e., the composition of a (set of) perception functions which map continuous data to discrete symbols, and a symbolic function over the set of symbols. DSL simultaneously learns the perception and symbolic functions while being trained only on their composition (NeSy-function). The key novelty of DSL is that it can create internal (interpretable) symbolic representations and map them to perception inputs within a differentiable NN learning pipeline. The created symbols are automatically selected to generate symbolic functions that best explain the data. We provide experimental analysis to substantiate the efficacy of DSL in simultaneously learning perception and symbolic functions.

List of keywords

Machine Learning -> ML: Neuro-symbolic methods
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Explainable/Interpretable machine learning

2774

Approximating Fair Division on D-Claw-Free Graphs

Zbigniew Lonc

[+] More

[-] Less

We study the problem of fair allocation of indivisible goods that form a graph and the bundles that are distributed to agents are connected subgraphs of this graph. We focus on the maximin share and the proportional fairness criteria. It is well-known that allocations satisfying these criteria may not exist for many graphs including complete graphs and cycles. Therefore, it is natural to look for approximate allocations, i.e., allocations guaranteeing each agent a certain portion of the value that is satisfactory to her. In this paper we consider the class of graphs of goods which do not contain a star with d+1 edges (where d > 1) as an induced subgraph. For this class of graphs we prove that there is an allocation assigning each agent a connected bundle of value at least 1/d of her maximin share. Moreover, for the same class of graphs of goods, we show a theorem which specifies what fraction of the proportional share can be guaranteed to each agent if the values of single goods for the agents are bounded by a given fraction of this share.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Agent-based and Multi-agent Systems -> MAS: Resource allocation

2777

Learning Gaussian Mixture Representations for Tensor Time Series Forecasting

Jiewen Deng, Renhe Jiang, Jinliang Deng, Xuan Song

[+] More

[-] Less

Tensor time series (TTS) data, a generalization of one-dimensional time series on a high-dimensional space, is ubiquitous in real-world scenarios, especially in monitoring systems involving multi-source spatio-temporal data (e.g., transportation demands and air pollutants). Compared to modeling time series or multivariate time series, which has received much attention and achieved tremendous progress in recent years, tensor time series has been paid less effort. Properly coping with the tensor time series is a much more challenging task, due to its high-dimensional and complex inner structure. In this paper, we develop a novel TTS forecasting framework, which seeks to individually model each heterogeneity component implied in the time, the location, and the source variables. We name this framework as GMRL, short for Gaussian Mixture Representation Learning. Experiment results on two real-world TTS datasets verify the superiority of our approach compared with the state-of-the-art baselines. Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle the heterogeneity components with different evolutions.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
Machine Learning -> ML: Time series and data streams

2780

Globally Consistent Federated Graph Autoencoder for Non-IID Graphs

Kun Guo, Yutong Fang, Qingqing Huang, Yuting Liang, Ziyao Zhang, Wenyu He, Liu Yang, Kai Chen, Ximeng Liu, Wenzhong Guo

[+] More

[-] Less

Graph neural networks (GNNs) have been applied successfully in many machine learning tasks due to their advantages in utilizing neighboring information. Recently, with the global enactment of privacy protection regulations, federated GNNs have gained increasing attention in academia and industry. However, the graphs owned by different participants could be non-independently-and-identically distributed (non-IID), leading to the deterioration of federated GNNs’ accuracy. In this paper, we propose a globally consistent federated graph autoencoder (GCFGAE) to overcome the non-IID problem in unsupervised federated graph learning via three innovations. First, by integrating federated learning with split learning, we train a unique global model instead of FedAvg-styled global and local models, yielding results consistent with that of the centralized GAE. Second, we design a collaborative computation mechanism considering overlapping vertices to reduce communication overhead during forward propagation. Third, we develop a layer-wise and block-wise gradient computation strategy to reduce the space and communication complexity during backward propagation. Experiments on real-world datasets demonstrate that GCFGAE achieves not only higher accuracy but also around 500 times lower communication overhead and 1000 times smaller space overhead than existing federated GNN models.

List of keywords

Machine Learning -> ML: Federated learning
Data Mining -> DM: Mining graphs

2788

Video Object Segmentation in Panoptic Scenes

Yuanyou Xu, Zongxin Yang, Yi Yang

[+] More

[-] Less

In this paper, we introduce video object segmentation (VOS) to panoptic scenes and present a large-scale benchmark as well as a baseline method for it. Previous benchmarks for VOS with sparse annotations are not sufficient to train or evaluate a model that needs to process all possible objects in real-world scenarios. Our new benchmark (VIPOSeg) contains exhaustive object annotations and covers various real-world object categories which are carefully divided into subsets of thing/stuff and seen/unseen classes for comprehensive evaluation. Considering tracking and segmenting numerous dense objects in panoptic scenes are more challenging than processing sparse objects, we propose a strong baseline method named panoptic object association with transformers (PAOT). A pyramid architecture and an efficient transformer structure are proposed for multi-scale object matching. In addition, panoptic identification embeddings are generated by decoupled identity banks for thing and stuff objects for panoptic object association. Experimental results show that VIPOSeg can not only boost the performance of VOS models by panoptic training but also evaluate them comprehensively in panoptic scenes. The evaluation results show that previous methods for generic VOS still need to improve in performance and efficiency when dealing with panoptic scenes, while our PAOT method achieves SOTA performance with good efficiency on both VIPOSeg and previous VOS benchmarks.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Video analysis and understanding

2789

One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER

Xiang Chen, Lei Li, Shuofei Qiao, Ningyu Zhang, Chuanqi Tan, Yong Jiang, Fei Huang, Huajun Chen

[+] More

[-] Less

Cross-domain NER is a challenging task to address the low-resource problem in practical scenarios. Previous typical solutions mainly obtain a NER model by pre-trained language models (PLMs) with data from a rich-resource domain and adapt it to the target domain. Owing to the mismatch issue among entity types in different domains, previous approaches normally tune all parameters of PLMs, ending up with an entirely new NER model for each domain. Moreover, current models only focus on leveraging knowledge in one general source domain while failing to successfully transfer knowledge from multiple sources to the target. To address these issues, we introduce Collaborative Domain-Prefix Tuning for cross-domain NER (CP-NER) based on text-to-text generative PLMs. Specifically, we present text-to-text generation grounding domain-related instructors to transfer knowledge to new domain NER tasks without structural modifications. We utilize frozen PLMs and conduct collaborative domain-prefix tuning to stimulate the potential of PLMs to handle NER tasks across various domains. Experimental results on the Cross-NER benchmark show that the proposed approach has flexible transfer ability and performs better on both one-source and multiple-source cross-domain NER tasks.

List of keywords

Natural Language Processing -> NLP: Information extraction
Natural Language Processing -> NLP: Named entities

2793

Revisiting the Evaluation of Deep Learning-Based Compiler Testing

Yongqiang Tian, Zhenyang Xu, Yiwen Dong, Chengnian Sun, Shing-Chi CHEUNG

[+] More

[-] Less

A high-quality program generator is essential to effective automated compiler testing. Engineering such a program generator is difficult, timeconsuming, and specific to the language under testing, thus requiring tremendous efforts from human experts with language-specific domain knowledge. To avoid repeatedly writing program generators for different languages, researchers recently proposed a language-agnostic approach based on deep learning techniques to automatically learn a program generator (referred to as DLG) from existing programs. Evaluations show that DLGs outperform Language-Specific Program Generators (LSGs) in testing compilers. However, we argue that it is unfair to use LSGs as baselines to evaluate DLGs. LSGs aim to validate compiler optimizations by only generating compilable, well-defined test programs; this restriction inevitably impairs the diversity of the language features used in the generated programs. In contrast, DLGs do not aim to validate the correctness of compiler optimizations, and its generated programs are not guaranteed to be well-defined or even compilable. Therefore, it is not surprising that DLG generated programs are more diverse in terms of used language features than LSG-generated ones. This study revisits the evaluation of DLGs, and proposes a new, fair, simple yet strong baseline named Kitten for evaluating DLGs. Given a dataset consisting of human-written programs, instead of using deep learning techniques to learn a program generator, Kitten directly derives new programs by mutating the programs in the dataset. Extensive experiments with more than 1,500 CPU-hours demonstrate that the state-of-the-art DLGs fail to compete against such a simple baseline: 3 v.s. 1,750 hang bugs, 1 v.s. 34 distinct compiler crashes. We believe that DLGs still have a large room for improvement.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Software engineering

2815

Pyramid Diffusion Models For Low-light Image Enhancement

Dewei Zhou, Zongxin Yang, Yi Yang

[+] More

[-] Less

Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement. Recent diffusion models show realistic and detailed image generation through a sequence of denoising refinements and motivate us to introduce them to low-light image enhancement for recovering realistic details. However, we found two problems when doing this, i.e., 1) diffusion models keep constant resolution in one reverse process, which limits the speed; 2) diffusion models sometimes result in global degradation (e.g., RGB shift). To address the above problems, this paper proposes a Pyramid Diffusion model (PyDiff) for low-light image enhancement. PyDiff uses a novel pyramid diffusion method to perform sampling in a pyramid resolution style (i.e., progressively increasing resolution in one reverse process). Pyramid diffusion makes PyDiff much faster than vanilla diffusion models and introduces no performance degradation. Furthermore, PyDiff uses a global corrector to alleviate the global degradation that may occur in the reverse process, significantly improving the performance and making the training of diffusion models easier with little additional computational consumption. Extensive experiments on popular benchmarks show that PyDiff achieves superior performance and efficiency. Moreover, PyDiff can generalize well to unseen noise and illumination distributions.

List of keywords

Computer Vision -> CV: Computational photography
Computer Vision -> CV: Neural generative models, auto encoders, GANs

2816

Genetic Prompt Search via Exploiting Language Model Probabilities

Jiangjiang Zhao, Zhuoran Wang, Fangchun Yang

[+] More

[-] Less

Prompt tuning for large-scale pretrained language models (PLMs) has shown remarkable potential, especially in low-resource scenarios such as few-shot learning. Moreover, derivative-free optimisation (DFO) techniques make it possible to tune prompts for a black-box PLM to better fit downstream tasks. However, there are usually preconditions to apply existing DFO-based prompt tuning methods, e.g. the backbone PLM needs to provide extra APIs so that hidden states (and/or embedding vectors) can be injected into it as continuous prompts, or carefully designed (discrete) manual prompts need to be available beforehand, serving as the initial states of the tuning algorithm. To waive such preconditions and make DFO-based prompt tuning ready for general use, this paper introduces a novel genetic algorithm (GA) that evolves from empty prompts, and uses the predictive probabilities derived from the backbone PLM(s) on the basis of a (few-shot) training set to guide the token selection process during prompt mutations. Experimental results on diverse benchmark datasets show that the proposed precondition-free method significantly outperforms the existing DFO-style counterparts that require preconditions, including black-box tuning, genetic prompt search and gradient-free instructional prompt search.

List of keywords

Natural Language Processing -> NLP: Language models
Machine Learning -> ML: Few-shot learning
Natural Language Processing -> NLP: Other

2836

The Hardness of Reasoning about Probabilities and Causality

Benito van der Zander, Markus Bläser, Maciej Liskiewicz

[+] More

[-] Less

We study formal languages which are capable of fully expressing quantitative probabilistic reasoning and do-calculus reasoning for causal effects, from a computational complexity perspective. Our main focus is on the satisfiability problems whose instance formulas allow expressing many tasks in probabilistic and causal inference. The main contribution of this work is establishing the exact computational complexity of these satisfiability problems. We introduce a new natural complexity class, named succETR, which can be viewed as a succinct variant of the well-studied class ∃R, and show that the problems are complete for succETR. Our results imply even stronger algorithmic limitations than were proven by Fagin, Halpern, and Megiddo (1990) and Mossé, Ibeling, and Icard (2022) for some variants of the standard languages used commonly in probabilistic and causal inference.

List of keywords

Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
Knowledge Representation and Reasoning -> KRR: Causality
Machine Learning -> ML: Causality

2840

Part Aware Contrastive Learning for Self-Supervised Action Recognition

Yilei Hua, Wenhan Wu, Ce Zheng, Aidong Lu, Mengyuan Liu, Chen Chen, Shiqian Wu

[+] More

[-] Less

In recent years, remarkable results have been achieved in self-supervised action recognition using skeleton sequences with contrastive learning. It has been observed that the semantic distinction of human action features is often represented by local body parts, such as legs or hands, which are advantageous for skeleton-based action recognition. This paper proposes an attention-based contrastive learning framework for skeleton representation learning, called SkeAttnCLR, which integrates local similarity and global features for skeleton-based action representations. To achieve this, a multi-head attention mask module is employed to learn the soft attention mask features from the skeletons, suppressing non-salient local features while accentuating local salient features, thereby bringing similar local features closer in the feature space. Additionally, ample contrastive pairs are generated by expanding contrastive pairs based on salient and non-salient features with global features, which guide the network to learn the semantic representations of the entire skeleton. Therefore, with the attention mask mechanism, SkeAttnCLR learns local features under different data augmentation views. The experiment results demonstrate that the inclusion of local feature similarity significantly enhances skeleton-based action representation. Our proposed SkeAttnCLR outperforms state-of-the-art methods on NTURGB+D, NTU120-RGB+D, and PKU-MMD datasets. The code and settings are available at this repository: https://github.com/GitHubOfHyl97/SkeAttnCLR.

List of keywords

Computer Vision -> CV: Action and behavior recognition
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Representation learning
Machine Learning -> ML: Self-supervised Learning

2841

ODEE: A One-Stage Object Detection Framework for Overlapping and Nested Event Extraction

Jinzhong Ning, Zhihao Yang, Zhizheng Wang, Yuanyuan Sun, Hongfei Lin

[+] More

[-] Less

The task of extracting overlapping and nested events has received significant attention in recent times, as prior research has primarily focused on extracting flat events, overlooking the intricacies of overlapping and nested occurrences. In this work, we present a new approach to Event Extraction (EE) by reformulating it as an object detection task on a table of token pairs. Our proposed one-stage event extractor, called ODEE, can handle overlapping and nested events. The model is designed with a vertex-based tagging scheme and two auxiliary tasks of predicting the spans and types of event trigger words and argument entities, leveraging the full span information of event elements. Furthermore, in the training stage, we introduce a negative sampling method for table cells to address the imbalance problem of positive and negative table cell tags, meanwhile improving computational efficiency. Empirical evaluations demonstrate that ODEE achieves the state-of-the-art performance on three benchmarks for overlapping and nested EE (i.e., FewFC, Genia11, and Genia13). Furthermore, ODEE outperforms current state-of-the-art methods in terms of both number of parameters and inference speed, indicating its high computational efficiency. To facilitate future research in this area, the codes are publicly available at https://github.com/NingJinzhong/ODEE.

List of keywords

Natural Language Processing -> NLP: Information extraction
Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Named entities

2847

Approximate Inference in Logical Credal Networks

Radu Marinescu, Haifeng Qian, Alexander Gray, Debarun Bhattacharjya, Francisco Barahona, Tian Gao, Ryan Riegel

[+] More

[-] Less

The Logical Credal Network or LCN is a recent probabilistic logic designed for effective aggregation and reasoning over multiple sources of imprecise knowledge. An LCN specifies a set of probability distributions over all interpretations of a set of logical formulas for which marginal and conditional probability bounds on their truth values are known. Inference in LCNs involves the exact solution of a non-convex non-linear program defined over an exponentially large number of non-negative real valued variables and, therefore, is limited to relatively small problems. In this paper, we present ARIEL — a novel iterative message-passing scheme for approximate inference in LCNs. Inspired by classical belief propagation for graphical models, our method propagates messages that involve solving considerably smaller local non-linear programs. Experiments on several classes of LCNs demonstrate clearly that ARIEL yields high quality solutions compared with exact inference and scales to much larger problems than previously considered.

List of keywords

Uncertainty in AI -> UAI: Graphical models
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Uncertainty in AI -> UAI: Inference

2849

Spatial-Temporal Self-Attention for Asynchronous Spiking Neural Networks

Yuchen Wang, Kexin Shi, Chengzhuo Lu, Yuguo Liu, Malu Zhang, Hong Qu

[+] More

[-] Less

The brain-inspired spiking neural networks (SNNs) are receiving increasing attention due to their asynchronous event-driven characteristics and low power consumption. As attention mechanisms recently become an indispensable part of sequence dependence modeling, the combination of SNNs and attention mechanisms holds great potential for energy-efficient and high-performance computing paradigms. However, the existing works cannot benefit from both temporal-wise attention and the asynchronous characteristic of SNNs. To fully leverage the advantages of both SNNs and attention mechanisms, we propose an SNNs-based spatial-temporal self-attention (STSA) mechanism, which calculates the feature dependence across the time and space domains without destroying the asynchronous transmission properties of SNNs. To further improve the performance, we also propose a spatial-temporal relative position bias (STRPB) for STSA to consider the spatiotemporal position of spikes. Based on the STSA and STRPB, we construct a spatial-temporal spiking Transformer framework, named STS-Transformer, which is powerful and enables SNNs to work in an asynchronous event-driven manner. Extensive experiments are conducted on popular neuromorphic datasets and speech datasets, including DVS128 Gesture, CIFAR10-DVS, and Google Speech Commands, and our experimental results can outperform other state-of-the-art models.

List of keywords

Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Applications
Humans and AI -> HAI: Cognitive systems

2869

Teacher Assistant-Based Knowledge Distillation Extracting Multi-level Features on Single Channel Sleep EEG

Heng Liang, Yucheng Liu, Haichao Wang, Ziyu Jia

[+] More

[-] Less

Sleep stage classification is of great significance to the diagnosis of sleep disorders. However, existing sleep stage classification models based on deep learning are usually relatively large in size (wider and deeper), which makes them hard to be deployed on wearable devices. Therefore, it is a challenge to lighten the existing sleep stage classification models. In this paper, we propose a novel general knowledge distillation framework for sleep stage classification tasks called SleepKD. Our SleepKD, composed of the multi-level module, teacher assistant module, and other knowledge distillation modules, aims to lighten large-scale sleep stage classification models. Specifically, the multi-level module is able to transfer the multi-level knowledge extracted from sleep signals by the teacher model (large-scale model) to the student model (lightweight model). Moreover, the teacher assistant module bridges the large gap between the teacher and student network, and further improves the distillation. We evaluate our method on two public sleep datasets (Sleep-EDF and ISRUC-III). Compared to the baseline methods, the results show that our knowledge distillation framework achieves state-of-the-art performance. SleepKD can significantly lighten the sleep model while maintaining its classification performance.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Health and medicine
Humans and AI -> HAI: Applications

2873

CROP: Towards Distributional-Shift Robust Reinforcement Learning using Compact Reshaped Observation Processing

Philipp Altmann, Leonard Feuchtinger, Fabian Ritz, Jonas Nüßlein, Thomy Phan, Claudia Linnhoff-Popien

[+] More

[-] Less

The safe application of reinforcement learning (RL) requires generalization from limited training data to unseen scenarios. Yet, fulfilling tasks under changing circumstances is a key challenge in RL. Current state-of-the-art approaches for generalization apply data augmentation techniques to increase the diversity of training data. Even though this prevents overfitting to the training environment(s), it hinders policy optimization. Crafting a suitable observation, only containing crucial information, has been shown to be a challenging task itself. To improve data efficiency and generalization capabilities, we propose Compact Reshaped Observation Processing (CROP) to reduce the state information used for policy optimization. By providing only relevant information, overfitting to a specific training layout is precluded and generalization to unseen environments is improved. We formulate three CROPs that can be applied to fully observable observation- and action-spaces and provide methodical foundation. We empirically show the improvements of CROP in a distributionally shifted safety gridworld. We furthermore provide benchmark comparisons to full observability and data-augmentation in two different-sized procedurally generated mazes.

List of keywords

Machine Learning -> ML: Deep reinforcement learning
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Machine Learning -> ML: Robustness

2877

SemiGNN-PPI: Self-Ensembling Multi-Graph Neural Network for Efficient and Generalizable Protein–Protein Interaction Prediction

Ziyuan Zhao, Peisheng Qian, Xulei Yang, Zeng Zeng, Cuntai Guan, Wai Leong Tam, Xiaoli Li

[+] More

[-] Less

Protein-protein interactions (PPIs) are crucial in various biological processes and their study has significant implications for drug development and disease diagnosis. Existing deep learning methods suffer from significant performance degradation under complex real-world scenarios due to various factors, e.g., label scarcity and domain shift. In this paper, we propose a self-ensembling multi-graph neural network (SemiGNN-PPI) that can effectively predict PPIs while being both efficient and generalizable. In SemiGNN-PPI, we not only model the correlations between proteins but explore the label dependencies by constructing and processing multiple graphs from the perspectives of both features and labels in the graph learning process. We further marry GNN with Mean Teacher to effectively leverage unlabeled graph-structured PPI data for self-ensemble graph learning. We also design multiple graph consistency constraints to align the student and teacher graphs in the feature embedding space, enabling the student model to better learn from the teacher model by incorporating more relationships. Extensive experiments on PPI datasets of different scales with different evaluation settings demonstrate that SemiGNN-PPI outperforms state-of-the-art PPI prediction methods, particularly in challenging scenarios such as training with limited annotations and testing on unseen data.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Bioinformatics
Multidisciplinary Topics and Applications -> MDA: Health and medicine

2883

Locate, Refine and Restore: A Progressive Enhancement Network for Camouflaged Object Detection

Xiaofei Li, JIAXIN YANG, Shuohao LI, Jun Lei, Jun Zhang, Dong Chen

[+] More

[-] Less

Camouflaged Object Detection (COD) intends to accurately segment objects that are visually integrated into their surroundings. Most existing methods mainly tackle this issue by a single-step framework, which tends to degrade performance in the face of small objects, low-contrast objects, and objects with large variation in appearance. In this paper, we propose a novel Progressive Enhancement Network (PENet) for COD by imitating the human visual detection system, which follows a three-step detection fashion: locate objects, refine textures, and restore boundaries. Specifically, our PENet contains three key modules, i.e., the object location module (OLM), the group attention module (GAM), and the context feature restoration module (CFRM). The OLM is designed to position the object globally, the GAM is developed to refine both high-level semantic and low-level texture feature representation, and the CFAM is leveraged to effectively aggregate multi-level features for progressively restoring the clear boundaries. Extensive experiments results demonstrate that our PENet significantly outperforms the 31 state-of-the-art methods on four widely used benchmark datasets. The code will be open source soon.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Recognition (object detection, categorization)

2905

Computing Abductive Explanations for Boosted Regression Trees

Gilles Audemard, Steve Bellart, Jean-Marie Lagniez, Pierre Marquis

[+] More

[-] Less

We present two algorithms for generating (resp. evaluating) abductive explanations for boosted regression trees. Given an instance x and an interval I containing its value F(x) for the boosted regression tree F at hand, the generation algorithm returns a (most general) term t over the Boolean conditions in F such that every instance x’ satisfying t is such that F(x’) belongs to I. The evaluation algorithm tackles the corresponding inverse problem: given F, x, and a term t over the Boolean conditions in F such that t covers x, find the least interval I_t such that for every instance x’ covered by t we have F(x’) in I_t. Experiments on various datasets show that the two algorithms are practical enough to be used for generating (resp. evaluating) abductive explanations for boosted regression trees of significant size.

List of keywords

Machine Learning -> ML: Explainable/Interpretable machine learning
Constraint Satisfaction and Optimization -> CSO: Constraint programming
Machine Learning -> ML: Regression

2907

Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits

Vishakha Patil, Vineet Nair, Ganesh Ghalme, Arindam Khan

[+] More

[-] Less

We study the Improving Multi-Armed Bandit problem, where the reward obtained from an arm increases with the number of pulls it receives. This model provides an elegant abstraction for many real-world problems in domains such as education and employment, where decisions about the distribution of opportunities can affect the future capabilities of communities and the disparity between them. A decision-maker in such settings must consider the impact of her decisions on future rewards in addition to the standard objective of maximizing her cumulative reward at any time. We study the tension between two seemingly conflicting objectives in the horizon-unaware setting: a) maximizing the cumulative reward at any time and b) ensuring that arms with better long-term rewards get sufficient pulls even if they initially have low rewards. We show that, surprisingly, the two objectives are aligned with each other. Our main contribution is an anytime algorithm for the IMAB problem that achieves the best possible cumulative reward while ensuring that the arms reach their true potential given sufficient time. Our algorithm mitigates the initial disparity due to lack of opportunity and continues pulling an arm until it stops improving. We prove the optimality of our algorithm by showing that a) any algorithm for the IMAB problem, no matter how utilitarian, must suffer $\Omega(T)$ policy regret and $\Omega(k)$ competitive ratio with respect to the optimal offline policy, and b) the competitive ratio of our algorithm is $O(k)$.

List of keywords

Machine Learning -> ML: Online learning
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Uncertainty in AI -> UAI: Sequential decision making

2911

Meta-Tsallis-Entropy Minimization: a new Self-Training approach for domain adaptation on text classification

Menglong Lu, Zhen Huang, ZHILIANG TIAN, Yunxiang Zhao, xuanyu fei, Dongsheng Li

[+] More

[-] Less

Text classification is a fundamental task for natural language processing, and adapting text classification models across domains has broad applications. Self-training generates pseudo-examples from the model’s predictions and iteratively trains on the pseudo-examples, i.e., minimizes the loss on the source domain and the Gibbs entropy on the target domain. However, Gibbs entropy is sensitive to prediction errors, and thus, self-training tends to fail when the domain shift is large. In this paper, we propose Meta-Tsallis Entropy minimization (MTEM). MTEM uses an instance adaptive Tsallis entropy to replace the Gibbs entropy and a meta-learning algorithm to optimize the instance adaptive Tsallis entropy on the target domain. To reduce the computation cost of MTEM, we propose an approximation technique to approximate the second-order derivation involved in the meta-learning. To efficiently generate pseudo labels, we propose an annealing sampling mechanism for exploring the model’s prediction probability. Theoretically, we prove the convergence of the meta-learning algorithm in MTEM and analyze the effectiveness of MTEM in achieving domain adaptation. Experimentally, MTEM improves the adaptation performance of BERT with an average of 4 percent.

List of keywords

Natural Language Processing -> NLP: Text classification
Natural Language Processing -> NLP: Applications

2912

The Parameterized Complexity of Finding Concise Local Explanations

Sebastian Ordyniak, Giacomo Paesani, Stefan Szeider

[+] More

[-] Less

We consider the computational problem of finding a smallest local explanation (anchor) for classifying a given feature vector (example) by a black-box model. After showing that the problem is NP-hard in general, we study various natural restrictions of the problem in terms of problem parameters to see whether these restrictions make the problem fixed-parameter tractable or not. We draw a detailed and systematic complexity landscape for combinations of parameters, including the size of the anchor, the size of the anchor’s coverage, and parameters that capture structural aspects of the problem instance, including rank-width and maximum difference.

List of keywords

Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Machine Learning -> ML: Explainable/Interpretable machine learning

2927

ReLiNet: Stable and Explainable Multistep Prediction with Recurrent Linear Parameter Varying Networks

Alexandra Baier, Decky Aspandi, Steffen Staab

[+] More

[-] Less

Multistep prediction models are essential for the simulation and model-predictive control of dynamical systems. Verifying the safety of such models is a multi-faceted problem requiring both system-theoretic guarantees as well as establishing trust with human users. In this work, we propose a novel approach, ReLiNet (Recurrent Linear Parameter Varying Network), to ensure safety for multistep prediction of dynamical systems. Our approach simplifies a recurrent neural network to a switched linear system that is constrained to guarantee exponential stability, which acts as a surrogate for safety from a system-theoretic perspective. Furthermore, ReLiNet’s computation can be reduced to a single linear model for each time step, resulting in predictions that are explainable by definition, thereby establishing trust from a human-centric perspective. Our quantitative experiments show that ReLiNet achieves prediction accuracy comparable to that of state-of-the-art recurrent neural networks, while achieving more faithful and robust explanations compared to the model-agnostic explanation method of LIME.

List of keywords

Machine Learning -> ML: Recurrent networks
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
AI Ethics, Trust, Fairness -> ETF: Safety and robustness

2929

Complex Contagion Influence Maximization: A Reinforcement Learning Approach

Haipeng Chen, Bryan Wilder, Wei Qiu, Bo An, Eric Rice, Milind Tambe

[+] More

[-] Less

In influence maximization (IM), the goal is to find a set of seed nodes in a social network that maximizes the influence spread. While most IM problems focus on classical influence cascades (e.g., Independent Cascade and Linear Threshold) which assume individual influence cascade probability is independent of the number of neighbors, recent studies by sociologists show that many influence cascades follow a pattern called complex contagion (CC), where influence cascade probability is much higher when more neighbors are influenced. Nonetheless, there are very limited studies for complex contagion influence maximization (CCIM) problems. This is partly because CC is non-submodular, the solution of which has been an open challenge. In this study, we propose the first reinforcement learning (RL) approach to CCIM. We find that a key obstacle in applying existing RL approaches to CCIM is the reward sparseness issue, which comes from two distinct sources. We then design a new RL algorithm that uses the CCIM problem structure to address the issue. Empirical results show that our approach achieves the state-of-the-art performance on 9 real-world networks.

List of keywords

Search -> S: Combinatorial search and optimisation
Machine Learning -> ML: Reinforcement learning
Multidisciplinary Topics and Applications -> MDA: Web and social networks

2969

Shaken, and Stirred: Long-Range Dependencies Enable Robust Outlier Detection with PixelCNN++

Barath Mohan Umapathi, Kushal Chauhan, Pradeep Shenoy, Devarajan Sridharan

[+] More

[-] Less

Reliable outlier detection is critical for real-world deployment of deep learning models. Although extensively studied, likelihoods produced by deep generative models have been largely dismissed as being impractical for outlier detection. First, deep generative model likelihoods are readily biased by low-level input statistics. Second, many recent solutions for correcting these biases are computationally expensive or do not generalize well to complex, natural datasets. Here, we explore outlier detection with a state-of-the-art deep autoregressive model: PixelCNN++. We show that biases in PixelCNN++ likelihoods arise primarily from predictions based on local dependencies. We propose two families of bijective transformations — “shaking” and “stirring” — which ameliorate low-level biases and isolate the contribution of long-range dependencies to PixelCNN++ likelihoods. These transformations are inexpensive and readily computed at evaluation time. We test our approaches extensively with five grayscale and six natural image datasets and show that they achieve or exceed state-of-the-art outlier detection, particularly on datasets with complex, natural images. We also show that our solutions work well with other types of generative models (generative flows and variational autoencoders) and that their efficacy is governed by each model’s reliance on local dependencies. In sum, lightweight remedies suffice to achieve robust outlier detection on images with deep generative models.

List of keywords

Computer Vision -> CV: Neural generative models, auto encoders, GANs
Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Robustness

2989

ICDA: Illumination-Coupled Domain Adaptation Framework for Unsupervised Nighttime Semantic Segmentation

Chenghao Dong, Xuejing Kang, Anlong Ming

[+] More

[-] Less

The performance of nighttime semantic segmentation has been significantly improved thanks to recent unsupervised methods. However, these methods still suffer from complex domain gaps, i.e., the challenging illumination gap and the inherent dataset gap. In this paper, we propose the illumination-coupled domain adaptation framework(ICDA) to effectively avoid the illumination gap and mitigate the dataset gap by coupling daytime and nighttime images as a whole with semantic relevance. Specifically, we first design a new composite enhancement method(CEM) that considers not only illumination but also spatial consistency to construct the source and target domain pairs, which provides the basic adaptation unit for our ICDA. Next, to avoid the illumination gap, we devise the Deformable Attention Relevance(DAR) module to capture the semantic relevance inside each domain pair, which can couple the daytime and nighttime images at the feature level and adaptively guide the predictions of nighttime images. Besides, to mitigate the dataset gap and acquire domain-invariant semantic relevance, we propose the Prototype-based Class Alignment(PCA) module, which improves the usage of category information and performs fine-grained alignment. Extensive experiments show that our method reduces the complex domain gaps and achieves state-of-the-art performance for nighttime semantic segmentation.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

2998

Efficient NLP Model Finetuning via Multistage Data Filtering

Xu Ouyang, Shahina Mohd Azam Ansari, Felix Lin, Yangfeng Ji

[+] More

[-] Less

As model finetuning is central to the modern NLP, we set to maximize its efficiency. Motivated by redundancy in training examples and the sheer sizes of pretrained models, we exploit a key opportunity: training only on important data. To this end, we set to filter training examples in a streaming fashion, in tandem with training the target model. Our key techniques are two: (1) automatically determine a training loss threshold for skipping backward training passes; (2) run a meta predictor for further skipping forward training passes. We integrate the above techniques in a holistic, three-stage training pro- cess. On a diverse set of benchmarks, our method reduces the required training examples by up to 5.3× and training time by up to 6.8×, while only seeing minor accuracy degradation. Our method is effective even for training one epoch, where each training example is encountered only once. It is simple to implement and is compatible with the existing finetuning techniques.

List of keywords

Natural Language Processing -> NLP: Text classification
Machine Learning -> ML: Automated machine learning

3002

Scalable Verification of Strategy Logic by Three-Valued Abstraction

Francesco Belardinelli, Angelo Ferrando, Wojciech Jamroga, Vadim Malvone, Aniello Murano

[+] More

[-] Less

The model checking problem for multi-agent systems against Strategy Logic specifications is known to be non-elementary. On this logic several fragments have been defined to tackle this issue but at the expense of expressiveness. In this paper, we propose a three-valued semantics for Strategy Logic upon which we define an abstraction method. We show that the latter semantics is an approximation of the classic two-valued one for Strategy Logic. Furthermore, we extend MCMAS, an open-source model checker for multi-agent specifications, to incorporate our abstraction method and present some promising experimental results.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning

3034

Minimizing Reachability Times on Temporal Graphs via Shifting Labels

Argyrios Deligkas, Eduard Eiben, George Skretas

[+] More

[-] Less

We study how we can accelerate the spreading of information in temporal graphs via shifting operations; a problem that captures real-world applications varying from information flows to distribution schedules. In a temporal graph there is a set of fixed vertices and the available connections between them change over time in a predefined manner. We observe that, in some cases, shifting some connections, i.e., advancing or delaying them, can decrease the time required to reach from some vertex (source) to another vertex. We study how we can minimize the maximum time a set of sources needs to reach every vertex, when we are allowed to shift some of the connections. If we restrict the allowed number of changes, we prove that, already for a single source, the problem is NP-hard, and W[2]-hard when parameterized by the number of changes. Then we focus on unconstrained number of changes. We derive a polynomial-time algorithm when there is one source. When there are two sources, we show that the problem becomes NP-hard; on the other hand, we design an FPT algorithm parameterized by the treewidth of the graph plus the lifetime of the optimal solution, that works for any number of sources. Finally, we provide polynomial-time algorithms for several graph classes.

List of keywords

Planning and Scheduling -> PS: Theoretical foundations of planning
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: Scheduling

3037

RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models

Xingchen Zhou, Ying He, F Richard Yu, Jianqiang Li, You Li

[+] More

[-] Less

The emergence of Neural Radiance Fields (NeRF) has promoted the development of synthesized high-fidelity views of the intricate real world. However, it is still a very demanding task to repaint the content in NeRF. In this paper, we propose a novel framework that can take RGB images as input and alter the 3D content in neural scenes. Our work leverages existing diffusion models to guide changes in the designated 3D content. Specifically, we semantically select the object we want to modify first, and a pre-trained diffusion model will guide the NeRF model to generate new 3D objects, which can improve the editability, diversity, and application range of NeRF. Experiment results show that our algorithm is effective for editing 3D objects in NeRF under different text prompts, including editing appearance, shape, etc. We validate our method on real-world datasets and synthetic-world datasets for these editing tasks.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications
Computer Vision -> CV: Neural generative models, auto encoders, GANs

3040

Dual Video Summarization: From Frames to Captions

Zhenzhen Hu, zhenshan wang, Zijie Song, Richang Hong

[+] More

[-] Less

Video summarization and video captioning both condense the video content from the perspective of visual and text modes, i.e. the keyframe selection and language description generation. Existing video-and-language learning models commonly sample multiple frames for training instead of observing all. These sampled deputies greatly improve computational efficiency, but do they represent the original video content enough with no more redundancy? In this work, we propose a dual video summarization framework and verify it in the context of video captioning. Given the video frames, we firstly extract the visual representation based on the ViT model fine-tuned on the video-text domain. Then we summarize the keyframes according to the frame-lever score. To compress the number of keyframes as much as possible while ensuring the quality of captioning, we learn a cross-modal video summarizer to select the most semantically consistent frames according to the pseudo score label. Top $K$ frames ( $K$ is no more than $3\%$ of the entire video.) are chosen to form the video representation. Moreover, to evaluate the static appearance and temporal information of video, we design the ranking scheme of video representation from two aspects: feature-oriented and sequence-oriented. Finally, we generate the descriptions with a lightweight LSTM decoder. The experiment results on the MSR-VTT and MSVD dataset reveal that, for the generative task as video captioning, a small number of keyframes can convey the same semantic information to perform well on captioning, or even better than the original sampling.

List of keywords

Computer Vision -> CV: Vision and language
Computer Vision -> CV: Video analysis and understanding

3049

Neuro-Symbolic Class Expression Learning

Caglar Demir, Axel-Cyrille Ngonga Ngomo

[+] More

[-] Less

Deep learning based models have been effectively applied to tackle various problems in many disciplines. Yet, their predictions are often at most post-hoc and locally explainable. In contrast, predicted class expressions in description logics are ante-hoc and globally explainable. Although state-of-the-art symbolic models have been successfully applied to learn class expressions, their large-scale applications have been hindered by their impractical runtimes. Arguably, the reliance on myopic heuristic functions contributes to this limitation. We propose a novel neuro-symbolic class expression learning model, Drill, to mitigate this limitation. By learning non-myopic heuristic functions with deep Q-learning, Drill efficiently steers the standard search procedure in a quasi-ordered search space towards goal states. Our extensive experiments on 4 benchmark datasets and 390 learning problems suggest that Drill converges to goal states at least 2.7 times faster than state-of-the-art models on all learning problems. The results of our statistical significance test confirms that~ Drill converges to goal states significantly faster (p-value <1\%) than state-of-the-art models on all benchmark datasets. We provide an open-source implementation of Drill, including pre-trained models, training and evaluation scripts.

List of keywords

Machine Learning -> ML: Representation learning
Machine Learning -> ML: Deep reinforcement learning

3072

Learning Preference Models with Sparse Interactions of Criteria

margot herin, Patrice Perny, Nataliya Sokolovska

[+] More

[-] Less

Multicriteria decision making requires defining the result of conflicting and possibly interacting criteria. Allowing criteria interactions in a decision model increases the complexity of the preference learning task due to the combinatorial nature of the possible interactions. In this paper, we propose an approach to learn a decision model in which the interaction pattern is revealed from preference data and kept as simple as possible. We consider weighted aggregation functions like multilinear utilities or Choquet integrals, admitting representations including non-linear terms measuring the joint benefit or penalty attached to some combinations of criteria. The weighting coefficients known as Möbius masses model positive or negative synergies among criteria. We propose an approach to learn the Möbius masses, based on iterative reweighted least square for sparse recovery, and dualization to improve scalability. This approach is applied to learn sparse representations of the multilinear utility model and conjunctive/disjunctive forms of the discrete Choquet integral from preferences examples, in aggregation problems possibly involving more than 20 criteria.

List of keywords

Machine Learning -> ML: Learning preferences or rankings
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Uncertainty in AI -> UAI: Decision and utility theory

3073

Simplification and Improvement of MMS Approximation

Hannaneh Akrami, Jugal Garg, Eklavya Sharma, Setareh Taki

[+] More

[-] Less

We consider the problem of fairly allocating a set of indivisible goods among n agents with additive valuations, using the popular fairness notion of maximin share (MMS). Since MMS allocations do not always exist, a series of works provided existence and algorithms for approximate MMS allocations. The current best approximation factor, for which the existence is known, is (3/4 + 1/12n) [Garg and Taki, 2021]. Most of these results are based on complicated analyses, especially those providing better than 2/3 factor. Moreover, since no tight example is known of the Garg-Taki algorithm, it is unclear if this is the best factor of this approach. In this paper, we significantly simplify the analysis of this algorithm and also improve the existence guarantee to a factor of (3/4 + min(1/36, 3/(16n-4))). For small n, this provides a noticeable improvement. Furthermore, we present a tight example of this algorithm, showing that this may be the best factor one can hope for with the current techniques.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Computational social choice

3078

Negative Flux Aggregation to Estimate Feature Attributions

Xin Li, Deng Pan, CHNEGYIN LI, Yao Qiang, Dongxiao Zhu

[+] More

[-] Less

There are increasing demands for understanding deep neural networks’ (DNNs) behavior spurred by growing security and/or transparency concerns. Due to multi-layer nonlinearity of the deep neural network architectures, explaining DNN predictions still remains as an open problem, preventing us from gaining a deeper understanding of the mechanisms. To enhance the explainability of DNNs, we estimate the input feature’s attributions to the prediction task using divergence and flux. Inspired by the divergence theorem in vector analysis, we develop a novel Negative Flux Aggregation (NeFLAG) formulation and an efficient approximation algorithm to estimate attribution map. Unlike the previous techniques, ours doesn’t rely on fitting a surrogate model nor need any path integration of gradients. Both qualitative and quantitative experiments demonstrate a superior performance of NeFLAG in generating more faithful attribution maps than the competing methods.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI

3093

Spotlight News Driven Quantitative Trading Based on Trajectory Optimization

Mengyuan Yang, Qianqiao Liang, Xiaolin Zheng, Mengying Zhu, Menghan Wang

[+] More

[-] Less

News-driven quantitative trading (NQT) has been popularly studied in recent years. Most existing NQT methods are performed in a two-step paradigm, i.e., first analyzing markets by a financial prediction task and then making trading decisions, which is doomed to failure due to the nearly futile financial prediction task. To bypass the financial prediction task, in this paper, we focus on reinforcement learning (RL) based NQT paradigm, which leverages news to make profitable trading decisions directly. In this paper, we propose a novel NQT framework SpotlightTrader based on decision trajectory optimization, which can effectively stitch together a continuous and flexible sequence of trading decisions to maximize profits. In addition, we enhance this framework by constructing a spotlight-driven state trajectory that obeys a stochastic process with irregular abrupt jumps caused by spotlight news. Furthermore, in order to adapt to non-stationary financial markets, we propose an effective training pipeline for this framework, which blends offline pretraining with online finetuning to balance exploration and exploitation effectively during online tradings. Extensive experiments on three real-world datasets demonstrate our proposed model’s superiority over the state-of-the-art NQT methods.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Finance
Machine Learning -> ML: Deep reinforcement learning

3095

Learning Small Decision Trees with Large Domain

Eduard Eiben, Sebastian Ordyniak, Giacomo Paesani, Stefan Szeider

[+] More

[-] Less

One favors decision trees (DTs) of the smallest size or depth to facilitate explainability and interpretability. However, learning such an optimal DT from data is well-known to be NP-hard. To overcome this complexity barrier, Ordyniak and Szeider (AAAI 21) initiated the study of optimal DT learning under the parameterized complexity perspective. They showed that solution size (i.e., number of nodes or depth of the DT) is insufficient to obtain fixed-parameter tractability (FPT). Therefore, they proposed an FPT algorithm that utilizes two auxiliary parameters: the maximum difference (as a structural property of the data set) and maximum domain size. They left it as an open question of whether bounding the maximum domain size is necessary. The main result of this paper answers this question. We present FPT algorithms for learning a smallest or lowest-depth DT from data, with the only parameters solution size and maximum difference. Thus, our algorithm is significantly more potent than the one by Szeider and Ordyniak as it can handle problem inputs with features that range over unbounded domains. We also close several gaps concerning the quality of approximation one obtains by only considering DTs based on minimum support sets.

List of keywords

Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Machine Learning -> ML: Explainable/Interpretable machine learning

3109

Safe Multi-agent Learning via Trapping Regions

Aleksander Czechowski, Frans Oliehoek

[+] More

[-] Less

One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to most single-agent environments, and sets a prohibitive barrier for deployment in practical applications, as it induces uncertainty in long term behavior of the system. In this work, we propose to apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. Upon verification of the direction of learning dynamics, the resulting trajectories are guaranteed not to escape such sets, during the learning process. As a result, it is ensured, that despite the uncertainty over convergence of the applied algorithms, learning will never form hazardous joint strategy combinations. We introduce a binary partitioning algorithm for verification of trapping regions in systems with known learning dynamics, and a heuristic sampling algorithm for scenarios where learning dynamics are not known. In addition, via a fixed point argument, we show the existence of a learning equilibrium within a trapping region. We demonstrate the applications to a regularized version of Dirac Generative Adversarial Network, a four-intersection traffic control scenario run in a state of the art open-source microscopic traffic simulator SUMO, and a mathematical model of economic competition.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning

3115

Fairly Allocating Goods and (Terrible) Chores

Hadi Hosseini, Aghaheybat Mammadov, Tomasz Wąs

[+] More

[-] Less

We study the fair allocation of mixture of indivisible goods and chores under lexicographic preferences—a subdomain of additive preferences. A prominent fairness notion for allocating indivisible items is envy-freeness up to any item (EFX). Yet, its existence and computation has remained a notable open problem. By identifying a class of instances with “terrible chores”, we show that determining the existence of an EFX allocation is NP-complete. This result immediately implies the intractability of EFX under additive preferences. Nonetheless, we propose a natural subclass of lexicographic preferences for which an EFX and Pareto optimal (PO) allocation is guaranteed to exist and can be computed efficiently for any mixed instance. Focusing on two weaker fairness notions, we investigate finding EF1 and Pareto optimal allocations for special instances with terrible chores, and show that MMS and PO allocations can be computed efficiently for any mixed instance with lexicographic preferences.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Game Theory and Economic Paradigms -> GTEP: Computational social choice

3117

Finding an ϵ-Close Minimal Variation of Parameters in Bayesian Networks

Bahare Salmani, Joost-Pieter Katoen

[+] More

[-] Less

This paper addresses the ε-close parameter tuning problem for Bayesian networks (BNs): find a minimal ε-close amendment of probability entries in a given set of (rows in) conditional probability tables that make a given quantitative constraint on the BN valid. Based on the state-of-the-art “region verification” techniques for parametric Markov chains, we propose an algorithm for this problem whose capabilities go beyond any existing BN tools. Our experiments show that ε-close tuning of large BN benchmarks with up to eight (explicitly varied) parameters is feasible. In particular, by allowing (i) varied parameters in multiple CPTs and (ii) inter-CPT parameter dependencies, we treat subclasses of parametric BNs that have received less attention so far.

List of keywords

Uncertainty in AI -> UAI: Bayesian networks

3127

Building Concise Logical Patterns by Constraining Tsetlin Machine Clause Size

Darshana Abeyrathna, Ahmed Abouzeid, Bimal Bhattarai, Charul Giri, Sondre Glimsdal, Ole-Christoffer Granmo, Lei Jiao, Rupsa Saha, Jivitesh Sharma, Svein Tunheim, Xuan Zhang

[+] More

[-] Less

Tsetlin Machine (TM) is a logic-based machine learning approach with the crucial advantages of being transparent and hardware-friendly. While TMs match or surpass deep learning accuracy for an increasing number of applications, large clause pools tend to produce clauses with many literals (long clauses). As such, they become less interpretable. Further, longer clauses increase the switching activity of the clause logic in hardware, consuming more power. This paper introduces a novel variant of TM learning — Clause Size Constrained TMs (CSC-TMs) — where one can set a soft constraint on the clause size. As soon as a clause includes more literals than the constraint allows, it starts expelling literals. Accordingly, oversized clauses only appear transiently. To evaluate CSC-TM, we conduct classification, clustering, and regression experiments on tabular data, natural language text, images, and board games. Our results show that CSC-TM maintains accuracy with up to 80 times fewer literals. Indeed, the accuracy increases with shorter clauses for TREC and BBC Sports. After the accuracy peaks, it drops gracefully as the clause size approaches one literal. We finally analyze CSC-TM power consumption and derive new convergence properties.

List of keywords

Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Other
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP

3138

Sketch Recognition via Part-based Hierarchical Analogical Learning

Kezhen Chen, Ken Forbus, Balaji Vasan Srinivasan, Niyati Chhaya, Madeline Usher

[+] More

[-] Less

Sketch recognition has been studied for decades, but it is far from solved. Drawing styles are highly variable across people and adapting to idiosyncratic visual expressions requires data-efficient learning. Explainability also matters, so that users can see why a system got confused about something. This paper introduces a novel part-based approach for sketch recognition, based on hierarchical analogical learning, a new method to apply analogical learning to qualitative representations. Given a sketched object, our system automatically segments it into parts and constructs multi-level qualitative representations of them. Our approach performs analogical generalization at multiple levels of part descriptions and uses coarse-grained results to guide interpretation at finer levels. Experiments on the Berlin TU dataset and the Coloring Book Objects dataset show that the system can learn explainable models in a data-efficient manner.

List of keywords

Humans and AI -> HAI: Cognitive modeling
Knowledge Representation and Reasoning -> KRR: Case-based reasoning
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning

3139

The Computational Complexity of Single-Player Imperfect-Recall Games

Emanuel Tewolde, Caspar Oesterheld, Vincent Conitzer, Paul Goldberg

[+] More

[-] Less

We study single-player extensive-form games with imperfect recall, such as the Sleeping Beauty problem or the Absentminded Driver game. For such games, two natural equilibrium concepts have been proposed as a solution concept alternative to ex-ante optimality. One equilibrium concept uses generalized double halving (GDH) as a belief system and evidential decision theory (EDT), and another one uses generalized thirding (GT) as a belief system and causal decision theory (CDT). Our findings associate those three solution concepts of a game to solution concepts of a polynomial maximization problem – namely – global optima, optimal points with respect to subsets of variables and Karush–Kuhn–Tucker (KKT) points. Based on these correspondences, we are able to settle various complexity-theoretic questions on the computation of ex-ante optimal or equilibrium strategies.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Game Theory and Economic Paradigms -> GTEP: Other
Uncertainty in AI -> UAI: Decision and utility theory

3140

Can You Improve My Code? Optimizing Programs with Local Search

Fatemeh Abdollahi, Saqib Ameen, Matthew E. Taylor, Levi Lelis

[+] More

[-] Less

This paper introduces a system that performs local search for improving an existing program with respect to a measurable objective. Program Optimization with Locally Improving Search (POLIS) exploits the structure of a program, defined by its lines, to partition a synthesis task into several smaller tasks that can be solved with existing brute-force synthesis algorithms. POLIS improves a single line of the program while keeping the remaining lines fixed, and continues iterating until it is unable to improve the objective value of the program. POLIS was evaluated with a 27-person user study, where participants were instructed and rewarded to write programs that maximized the score of two single-agent games: lunar lander and highway. POLIS was able to substantially improve the participants’ programs with respect to the game scores. These results suggest that POLIS could be used as a helpful programming assistant for programming problems with measurable objectives.

List of keywords

Humans and AI -> HAI: Applications
Humans and AI -> HAI: Human-AI collaboration

3153

Disentanglement of Latent Representations via Causal Interventions

Gaël Gendron, Michael Witbrock, Gillian Dobbie

[+] More

[-] Less

The process of generating data such as images is controlled by independent and unknown factors of variation. The retrieval of these variables has been studied extensively in the disentanglement, causal representation learning, and independent component analysis fields. Recently, approaches merging these domains together have shown great success. Instead of directly representing the factors of variation, the problem of disentanglement can be seen as finding the interventions on one image that yield a change to a single factor. Following this assumption, we introduce a new method for disentanglement inspired by causal dynamics that combines causality theory with vector-quantized variational autoencoders. Our model considers the quantized vectors as causal variables and links them in a causal graph. It performs causal interventions on the graph and generates atomic transitions affecting a unique factor of variation in the image. We also introduce a new task of action retrieval that consists of finding the action responsible for the transition between two images. We test our method on standard synthetic and real-world disentanglement datasets. We show that it can effectively disentangle the factors of variation and perform precise interventions on high-level semantic attributes of an image without affecting its quality, even with imbalanced data distributions.

List of keywords

Knowledge Representation and Reasoning -> KRR: Causality
Computer Vision -> CV: Representation learning
Machine Learning -> ML: Autoencoders

3155

Exploring Structural Similarity in Fitness Landscapes via Graph Data Mining: A Case Study on Number Partitioning Problems

Mingyu Huang, Ke Li

[+] More

[-] Less

One of the most common problem-solving heuristics is by analogy. For a given problem, a solver can be viewed as a strategic walk on its fitness landscape. Thus if a solver works for one problem instance, we expect it will also be effective for other instances whose fitness landscapes essentially share structural similarities with each other. However, due to the black-box nature of combinatorial optimization, it is far from trivial to infer such similarity in real-world scenarios. To bridge this gap, by using local optima network as a proxy of fitness landscapes, this paper proposed to leverage graph data mining techniques to conduct qualitative and quantitative analyses to explore the latent topological structural information embedded in those landscapes. In our experiments, we use the number partitioning problem as the case and our empirical results are inspiring to support the overall assumption of the existence of structural similarity between landscapes within neighboring dimensions. Besides, experiments on simulated annealing demonstrate that the performance of a metaheuristic solver is similar on structurally similar landscapes.

List of keywords

Data Mining -> DM: Exploratory data mining
Data Mining -> DM: Data visualization
Search -> S: Combinatorial search and optimisation

3162

DiffAR: Adaptive Conditional Diffusion Model for Temporal-augmented Human Activity Recognition

Shuokang Huang, Po-Yu Chen, Julie McCann

[+] More

[-] Less

Human activity recognition (HAR) is a fundamental sensing and analysis technique that supports diverse applications, such as smart homes and healthcare. In device-free and non-intrusive HAR, WiFi channel state information (CSI) captures wireless signal variations caused by human interference without the need for video cameras or on-body sensors. However, current CSI-based HAR performance is hampered by incomplete CSI recordings due to fixed window sizes in CSI collection and human/machine errors that incur missing values in CSI. To address these issues, we propose DiffAR, a temporal-augmented HAR approach that improves HAR performance by augmenting CSI. DiffAR devises a novel Adaptive Conditional Diffusion Model (ACDM) to synthesize augmented CSI, which tackles the issue of fixed windows by forecasting and handles missing values with imputation. Compared to existing diffusion models, ACDM improves the synthesis quality by guiding progressive synthesis with step-specific conditions. DiffAR further exploits an ensemble classifier for activity recognition using both raw and augmented CSI. Extensive experiments on four public datasets show that DiffAR achieves the best synthesis quality of augmented CSI and outperforms state-of-the-art CSI-based HAR methods in recognition performance. The source code of DiffAR is available at https://github.com/huangshk/DiffAR.

List of keywords

Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Time series and data streams
Data Mining -> DM: Mining spatial and/or temporal data
Machine Learning -> ML: Applications

3171

On Optimal Strategies for Wordle and General Guessing Games

Michael Cunanan, Michael Thielscher

[+] More

[-] Less

The recent popularity of Wordle has revived interest in guessing games. We develop a general method for finding optimal strategies for guessing games while avoiding an exhaustive search. Our main contribution are several theorems that build towards a general theory to prove optimality of a strategy for a guessing game. This work is developed to apply to any guessing game, but we use Wordle as an example to present concrete results.

List of keywords

Search -> S: Combinatorial search and optimisation
Multidisciplinary Topics and Applications -> MDA: Game playing
Search -> S: Applications

3174

Cognitively Inspired Learning of Incremental Drifting Concepts

Mohammad Rostami, Aram Galstyan

[+] More

[-] Less

Humans continually expand their learned knowledge to new domains and learn new concepts without any interference with past learned experiences. In contrast, machine learning models perform poorly in a continual learning setting, where input data distribution changes over time. Inspired by the nervous system learning mechanisms, we develop a computational model that enables a deep neural network to learn new concepts and expand its learned knowledge to new domains incrementally in a continual learning setting. We rely on the Parallel Distributed Processing theory to encode abstract concepts in an embedding space in terms of a multimodal distribution. This embedding space is modeled by internal data representations in a hidden network layer. We also leverage the Complementary Learning Systems theory to equip the model with a memory mechanism to overcome catastrophic forgetting through implementing pseudo-rehearsal. Our model can generate pseudo-data points for experience replay and accumulate new experiences to past learned experiences without causing cross-task interference.

List of keywords

Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Brain sciences
Humans and AI -> HAI: Cognitive systems

3195

Character As Pixels: A Controllable Prompt Adversarial Attacking Framework for Black-Box Text Guided Image Generation Models

Ziyi Kou, Shichao Pei, Yijun Tian, Xiangliang Zhang

[+] More

[-] Less

In this paper, we study a controllable prompt adversarial attacking problem for text guided image generation (Text2Image) models in the black-box scenario, where the goal is to attack specific visual subjects (e.g., changing a brown dog to white) in a generated image by slightly, if not imperceptibly, perturbing the characters of the driven prompt (e.g., “brown” $\rightarrow$ “bro\k{w}n”). Our study is motivated by the limitations of current Text2Image attacking approaches that still rely on manual trials to create adversarial prompts. To address such limitations, we develop CharGrad, a character-level gradient based attacking framework that replaces specific characters of a prompt with pixel-level similar ones by interactively learning the perturbation direction for the prompt and updating the attacking examiner for the generated image based on a novel proxy perturbation representation for characters. We evaluate CharGrad using the texts from two public image captioning datasets. Results demonstrate that CharGrad outperforms existing text adversarial attacking approaches on attacking various subjects of generated images by black-box Text2Image models in a more effective and efficient way with less perturbation on the characters of the prompts.

List of keywords

Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Neural generative models, auto encoders, GANs

3200

RZCR: Zero-shot Character Recognition via Radical-based Reasoning

Xiaolei Diao, Daqian Shi, Hao Tang, Qiang Shen, Yanzeng Li, Lei Wu, Hao Xu

[+] More

[-] Less

The long-tail effect is a common issue that limits the performance of deep learning models on real-world datasets. Character image datasets are also affected by such unbalanced data distribution due to differences in character usage frequency. Thus, current character recognition methods are limited when applied in the real world, especially for the categories in the tail that lack training samples, e.g., uncommon characters. In this paper, we propose a zero-shot character recognition framework via radical-based reasoning, called RZCR, to improve the recognition performance of few-sample character categories in the tail. Specifically, we exploit radicals, the graphical units of characters, by decomposing and reconstructing characters according to orthography. RZCR consists of a visual semantic fusion-based radical information extractor (RIE) and a knowledge graph character reasoner (KGR). RIE aims to recognize candidate radicals and their possible structural relations from character images in parallel. The results are then fed into KGR to recognize the target character by reasoning with a knowledge graph. We validate our method on multiple datasets, and RZCR shows promising experimental results, especially on few-sample character datasets.

List of keywords

Computer Vision -> CV: Vision and language
Multidisciplinary Topics and Applications -> MDA: Humanities

3221

Beyond Homophily: Robust Graph Anomaly Detection via Neural Sparsification

Zheng Gong, Guifeng Wang, Ying Sun, Qi Liu, Yuting Ning, Hui Xiong, Jingyu Peng

[+] More

[-] Less

Recently, graph-based anomaly detection (GAD) has attracted rising attention due to its effectiveness in identifying anomalies in relational and structured data. Unfortunately, the performance of most existing GAD methods suffers from the inherent structural noises of graphs induced by hidden anomalies connected with considerable benign nodes. In this work, we propose SparseGAD, a novel GAD framework that sparsifies the structures of target graphs to effectively reduce noises and collaboratively learns node representations. It then robustly detects anomalies by uncovering the underlying dependency among node pairs in terms of homophily and heterophily, two essential connection properties of GAD. Extensive experiments on real-world datasets of GAD demonstrate that the proposed framework achieves significantly better detection quality compared with the state-of-the-art methods, even when the graph is heavily attacked. Code will be available at https://github.com/KellyGong/SparseGAD.git.

List of keywords

Data Mining -> DM: Applications
Data Mining -> DM: Anomaly/outlier detection
Data Mining -> DM: Mining graphs

3222

Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models

Bingbing Li, Zigeng Wang, Shaoyi Huang, Mikhail Bragin, Ji Li, Caiwen Ding

[+] More

[-] Less

Pruning has been extensively studied in Transformer-based language models to improve efficiency. Typically, we zero (prune) unimportant model weights and train a derived compact model to improve final accuracy. For pruned weights, we treat them as useless and discard them. This usually leads to significant model accuracy degradation. In this paper, we focus on attention head pruning as head attention is a key component of the transformer-based language models and provides interpretable knowledge meaning. We reveal the relationship between pruned attention heads and retained heads and provide a solution to recycle the discarded knowledge from the pruned heads, named peer distillation. We also develop an automatic framework to locate the to-be-pruned attention heads in each layer, freeing the time-consuming human labor in tuning hyperparameters. Experimental results on the General Language Understanding Evaluation (GLUE) benchmark are provided using BERT model. By recycling discarded knowledge from pruned heads, the proposed method maintains model performance across all nine tasks while reducing heads by over 58% on average and outperforming state-of-the-art techniques (e.g., Random, HISP, L0 Norm, SMP).

List of keywords

Natural Language Processing -> NLP: Language models

3243

MAT: Mixed-Strategy Game of Adversarial Training in Fine-tuning

Zhehua Zhong, Tianyi Chen, Zhen Wang

[+] More

[-] Less

Fine-tuning large-scale pre-trained language models has been demonstrated effective for various natural language processing (NLP) tasks. Previous studies have established that incorporating adversarial training during the fine-tuning stage can significantly enhance model generalization and robustness. However, from the perspective of game theory, such utilizations of adversarial training correspond to pure-strategy games, which are inherently limited in terms of the scope of their strategies, thereby still having room for improvement. In order to push the performance boundaries, we propose a novel Mixed-strategy Adversarial Training algorithm (MAT). Methodologically, we derive the Nash equilibrium of a mixed-strategy game for adversarial training using Entropy Mirror Descent to establish MAT by sampling method. To verify the effectiveness of MAT, we conducted extensive benchmark experiments on large-scale pre-trained models, such as BERT and RoBERTa. MAT significantly outperforms the state-of-the-art methods on both the GLUE and ANLI benchmarks in terms of generalization and robustness.

List of keywords

Machine Learning -> ML: Adversarial machine learning
Natural Language Processing -> NLP: Other

3254

SAD: Semi-Supervised Anomaly Detection on Dynamic Graphs

Sheng Tian, Jihai Dong, Jintang Li, WENLONG ZHAO, Xiaolong Xu, Baokun Wang, Bowen Song, Changhua Meng, Tianyi Zhang, Liang Chen

[+] More

[-] Less

Anomaly detection aims to distinguish abnormal instances that deviate significantly from the majority of benign ones. As instances that appear in the real world are naturally connected and can be represented with graphs, graph neural networks become increasingly popular in tackling the anomaly detection problem. Despite the promising results, research on anomaly detection has almost exclusively focused on static graphs while the mining of anomalous patterns from dynamic graphs is rarely studied but has significant application value. In addition, anomaly detection is typically tackled from semi-supervised perspectives due to the lack of sufficient labeled data. However, most proposed methods are limited to merely exploiting labeled data, leaving a large number of unlabeled samples unexplored. In this work, we present semi-supervised anomaly detection (SAD), an end-to-end framework for anomaly detection on dynamic graphs. By a combination of a time-equipped memory bank and a pseudo-label contrastive learning module, SAD is able to fully exploit the potential of large unlabeled samples and uncover underlying anomalies on evolving graph streams. Extensive experiments on four real-world datasets demonstrate that SAD efficiently discovers anomalies from dynamic graphs and outperforms existing advanced methods even when provided with only little labeled data.

List of keywords

Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Time series and data streams

3260

Beyond Pure Text: Summarizing Financial Reports Based on Both Textual and Tabular Data

Ziao Wang, Zelin Jiang, Xiaofeng Zhang, Jaehyeon Soon, Jialu Zhang, Wang Xiaoyao, Hongwei Du

[+] More

[-] Less

Abstractive text summarization is to generate concise summaries that well preserve both salient information and the overall semantic meanings of the given documents. However, real-world documents, e.g., financial reports, generally contain rich data such as charts and tabular data which invalidates most existing text summarization approaches. This paper is thus motivated to propose this novel approach to simultaneously summarize both textual and tabular data. Particularly, we first manually construct a “table+text → summary” dataset. Then, the tabular data is respectively embedded in a row-wise and column-wise manner, and the textual data is encoded at the sentence-level via an employed pre-trained model. We propose a salient detector gate respectively performed between each pair of row/column and sentence embeddings. The highly correlated content is considered as salient information that must be summarized. Extensive experiments have been performed on our constructed dataset and the promising results demonstrate the effectiveness of the proposed approach w.r.t. a number of both automatic and human evaluation criteria.

List of keywords

Natural Language Processing -> NLP: Summarization
Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Language generation

3263

On Adversarial Robustness of Demographic Fairness in Face Attribute Recognition

Huimin Zeng, Zhenrui Yue, Lanyu Shang, Yang Zhang, Dong Wang

[+] More

[-] Less

Demographic fairness has become a critical objective when developing modern visual models for identity-sensitive applications, such as face attribute recognition (FAR). While great efforts have been made to improve the fairness of the models, the investigation on the adversarial robustness of the fairness (e.g., whether the fairness of the models could still be maintained under potential malicious fairness attacks) is largely ignored. Therefore, this paper explores the adversarial robustness of demographic fairness in FAR applications from both attacking and defending perspectives. In particular, we firstly present a novel fairness attack, who aims at corrupting the demographic fairness of face attribute classifiers. Next, to mitigate the effect of the fairness attack, we design an efficient defense algorithm called robust-fair training. With this defense, face attribute classifiers learn how to combat the bias introduced by the fairness attack. As such, the face attribute classifiers are not only trained to be fair, but the fairness is also robust. Our extensive experimental results show the effectiveness of both our proposed attack and defense methods across various model architectures and FAR applications. We believe our work could be strong baselines for future work on robust demographic fairness.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Bias
Computer Vision -> CV: Bias, fairness and privacy

3271

IMF: Integrating Matched Features Using Attentive Logit in Knowledge Distillation

Jeongho Kim, Hanbeen Lee, Simon S. Woo

[+] More

[-] Less

Knowledge distillation (KD) is an effective method for transferring the knowledge of a teacher model to a student model, that aims to improve the latter’s performance efficiently. Although generic knowledge distillation methods such as softmax representation distillation and intermediate feature matching have demonstrated improvements with various tasks, only marginal improvements are shown in student networks due to their limited model capacity. In this work, to address the student model’s limitation, we propose a novel flexible KD framework, Integrating Matched Features using Attentive Logit in Knowledge Distillation (IMF). Our approach introduces an intermediate feature distiller (IFD) to improve the overall performance of the student model by directly distilling the teacher’s knowledge into branches of student models. The generated output of IFD, which is trained by the teacher model, is effectively combined by attentive logit. We use only a few blocks of the student and the trained IFD during inference, requiring an equal or less number of parameters. Through extensive experiments, we demonstrate that IMF consistently outperforms other state-of-the-art methods with a large margin over the various datasets in different tasks without extra computation.

List of keywords

Computer Vision -> CV: Structural and model-based approaches, knowledge representation and reasoning
Computer Vision -> CV: Representation learning

3273

Random Assignment of Indivisible Goods under Constraints

Yasushi Kawase, Hanna Sumita, Yu Yokoi

[+] More

[-] Less

We investigate the problem of random assignment of indivisible goods, in which each agent has an ordinal preference and a constraint. Our goal is to characterize the conditions under which a random assignment that simultaneously satisfies efficiency and envy-freeness always exists. The probabilistic serial mechanism ensures the existence of such an assignment for the unconstrained setting. In this paper, we consider a more general setting in which each agent can consume a set of items only if the set satisfies her feasibility constraint. Such constraints must be taken into account in student course placements, employee shift assignments, and so on. We demonstrate that an efficient and envy-free assignment may not exist even for the simple case of partition matroid constraints, where the items are categorized, and each agent demands one item from each category. We then identify special cases in which an efficient and envy-free assignment always exists. For these cases, the probabilistic serial cannot be naturally extended; therefore, we provide mechanisms to find the desired assignment using various approaches.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Mechanism design

3281

Some General Identification Results for Linear Latent Hierarchical Causal Structure

Zhengming Chen, Feng Xie, Jie Qiao, Zhifeng Hao, Ruichu Cai

[+] More

[-] Less

We study the problem of learning hierarchical causal structure among latent variables from many measured variables. Although there are a few methods that are able to recover the latent hierarchical causal structure, they suffer from restricted assumptions, such as the tree-structured graph, no “triangle" structure, or the non-Gaussian. In this paper, we consider the more general and challenging scenario in cases where there are no restrictions on the tree-structured graph and the “triangle" structure and the noise terms of data may be partially non-Gaussian. We show that the hierarchical causal structure is identifiable under milder graphical conditions. Specially, we first show that, based on second-order statistics, the latent hierarchical structure can be identified up to the Markov equivalence classes over latent variables. Then, we show some directions among latent variables on those Markov equivalence classes that can be inferred based on partial higher-order statistics. Further, we design a method to efficiently learn the latent hierarchical structure. The experimental results on synthetic data verify the efficiency of the proposed method.

List of keywords

Machine Learning -> ML: Causality
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference

3294

MMPN: Multi-supervised Mask Protection Network for Pansharpening

Changjie Chen, Yong Yang, Shuying Huang, Wei Tu, Weiguo Wan, Shengna Wei

[+] More

[-] Less

Pansharpening is to fuse a panchromatic (PAN) image with a multispectral (MS) image to obtain a high-spatial-resolution multispectral (HRMS) image. The deep learning-based pansharpening methods usually apply the convolution operation to extract features and only consider the similarity of gradient information between PAN and HRMS images, resulting in the problems of edge blur and spectral distortion in the fusion results. To solve this problem, a multi-supervised mask protection network (MMPN) is proposed to prevent spatial information from being damaged and overcome spectral distortion in the learning process. Firstly, by analyzing the relationships between high-resolution images and corresponding degraded images, a mask protection strategy (MPS) for edge protection is designed to guide the recovery of fused images. Then, based on the MPS, an MMPN containing four branches is constructed to generate the fusion and mask protection images. In MMPN, each branch employs a dual-stream multi-scale feature fusion module (DMFFM), which is built to extract and fuse the features of two input images. Finally, different loss terms are defined for the four branches, and combined into a joint loss function to realize network training. Experiments on simulated and real satellite datasets show that our method is superior to state-of-the-art methods both subjectively and objectively.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Machine learning for vision

3308

Competitive-Cooperative Multi-Agent Reinforcement Learning for Auction-based Federated Learning

Xiaoli Tang, Han Yu

[+] More

[-] Less

Auction-based Federated Learning (AFL) is a key technology to enable open collaboration among self-interested data consumers and data owners. Existing AFL approaches cannot manage the mutual influence among multiple data consumers competing to enlist data owners. Moreover, they cannot support a single data owner to join multiple data consumers simultaneously. To bridge these gaps, we propose the Multi-Agent Reinforcement Learning for AFL (MARL-AFL) approach to steer data consumers to bid strategically towards an equilibrium with desirable overall system characteristics. We design a temperature-based reward reassignment scheme to make trade-offs between cooperation and competition among AFL data consumers. In this way, MARL-AFL can reach an equilibrium state that ensures individual data consumers can achieve good utility, while preserving system-level social welfare. To circumvent potential collusion behaviors among data consumers, we introduce a bar agent to set a personalized bidding lower bound for each data consumer. Extensive experiments on six commonly adopted benchmark datasets show that MARL-AFL is significantly more advantageous compared to six state-of-the-art approaches, outperforming the best by 12.2%, 1.9% and 3.4% in terms of average social welfare, revenue and model accuracy, respectively.

List of keywords

Machine Learning -> ML: Federated learning
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Machine Learning -> ML: Reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning

3314

iRe^2f: Rethinking Effective Refinement in Language Structure Prediction via Efficient Iterative Retrospecting and Reasoning

Zuchao Li, Xingyi Guo, Letian Peng, Lefei Zhang, Hai Zhao

[+] More

[-] Less

Refinement plays a critical role in language structure prediction, a process that deals with complex situations such as structural edge interdependencies. Since language structure prediction usually modeled as graph parsing, typical refinement methods involve taking an initial parsing graph as input and refining it using language input and other relevant information. Intuitively, a refinement component, i.e., refiner, should be lightweight and efficient, as it is only responsible for correcting faults in the initial graph. However, current refiners add a significant burden to the parsing process due to their reliance on time-consuming encoding-decoding procedure on the language input and graph. To make the refiner more practical for real-world applications, this paper proposes a lightweight but effective iterative refinement framework, \textsc{iRe$^2$f}, based on iterative retrospecting and reasoning without involving the re-encoding process on the graph. \textsc{iRe$^2$f} iteratively refine the parsing graph based on interaction between graph and sequence and efficiently learns the shortcut to update the sequence and graph representations in each iteration. The shortcut is calculated based on the graph representation in the latest iteration. \textsc{iRe$^2$f} reduces the number of refinement parameters by $90\%$ compared to the previous smallest refiner. Experiments on a variety of language structure prediction tasks show that \textsc{iRe$^2$f} performs comparably or better than current state-of-the-art refiners, with a significant increase in efficiency.

List of keywords

Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
Natural Language Processing -> NLP: Tagging, chunking, and parsing

3319

Temporal Constrained Feasible Subspace Learning for Human Pose Forecasting

Gaoang Wang, Mingli Song

[+] More

[-] Less

Human pose forecasting is a sequential modeling task that aims to predict future poses from historical motions. Most existing approaches focus on the spatial-temporal neural network model design for learning movement patterns to reduce prediction errors. However, they usually do not strictly follow the temporal constraints in the inference stage. Even though a small Mean Per Joint Position Error (MPJPE) is achieved, some of the predicted poses are not temporal feasible solutions, which disobeys the continuity of the body movement. In this paper, we consider the temporal constrained feasible solutions for human pose forecasting, where the predicted poses of input historical poses are guaranteed to obey the temporal constraints strictly in the inference stage. Rather than direct supervision of the prediction in the original pose space, a temporal constrained subspace is explicitly learned and then followed by an inverse transformation to obtain the final predictions. We evaluate the proposed method on large-scale benchmarks, including Human3.6M, AMASS, and 3DPW. With the STS-GCN as the encoder backbone, state-of-the-art performance has been achieved with the temporal constrained feasible solutions.

List of keywords

Computer Vision -> CV: Biometrics, face, gesture and pose recognition
Constraint Satisfaction and Optimization -> CSO: Constraint learning and acquisition

3349

Annealing Genetic Based Preposition Substitution for Text Rubbish Example Generation

Chen Li, Xinghao Yang, baodi liu, Weifeng Liu, Honglong Chen

[+] More

[-] Less

Modern Natural Language Processing (NLP) models expose under-sensitivity towards text rubbish examples. The text rubbish example is the heavily modified input text which is nonsensical to humans but does not change the model’s prediction. Prior work crafts rubbish examples by iteratively deleting words and determining the deletion order with beam search. However, the produced rubbish examples usually cause a reduction in model confidence and sometimes deliver human-readable text. To address these problems, we propose an Annealing Genetic based Preposition Substitution (AGPS) algorithm for text rubbish sample generation with two major merits. Firstly, the AGPS crafts rubbish text examples by substituting input words with meaningless prepositions instead of directly removing them, which brings less degradation to the model’s confidence. Secondly, we design an Annealing Genetic algorithm to optimize the word replacement priority, which allows the Genetic Algorithm (GA) to jump out the local optima with probabilities. This is significant in achieving better objectives, i.e., a high word modification rate and a high model confidence. Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences.

List of keywords

Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
Natural Language Processing -> NLP: Text classification

3352

Detecting Adversarial Faces Using Only Real Face Self-Perturbations

Qian Wang, Yongqin Xian, Hefei Ling, Jinyuan Zhang, Xiaorui Lin, Ping Li, Jiazhong Chen, Ning Yu

[+] More

[-] Less

Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniques achieve high accuracy in detecting some specific adversarial faces (adv-faces), new attack methods especially GAN-based attacks with completely different noise patterns circumvent them and reach a higher attack success rate. Even worse, existing techniques require attack data before implementing the defense, making it impractical to defend newly emerging attacks that are unseen to defenders. In this paper, we investigate the intrinsic generality of adv-faces and propose to generate pseudo adv-faces by perturbing real faces with three heuristically designed noise patterns. We are the first to train an adv-face detector using only real faces and their self-perturbations, agnostic to victim facial recognition systems, and agnostic to unseen attacks. By regarding adv-faces as out-of-distribution data, we then naturally introduce a novel cascaded system for adv-face detection, which consists of training data self-perturbations, decision boundary regularization, and a max-pooling-based binary classifier focusing on abnormal local color aberrations. Experiments conducted on LFW and CelebA-HQ datasets with eight gradient-based and two GAN-based attacks validate that our method generalizes to a variety of unseen adversarial attacks.

List of keywords

Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Applications
Computer Vision -> CV: Biometrics, face, gesture and pose recognition

3357

Spatially Constrained Adversarial Attack Detection and Localization in the Representation Space of Optical Flow Networks

Hannah Kim, Celia Cintas, Girmaw Abebe Tadesse, Skyler Speakman

[+] More

[-] Less

Optical flow estimation have shown significant improvements with advances in deep neural networks. However, these flow networks have recently been shown to be vulnerable to patch-based adversarial attacks, which poses security risks in real-world applications, such as self-driving cars and robotics. We propose SADL, a Spatially constrained adversarial Attack Detection and Localization framework, to detect and localize these patch-based attack without requiring a dedicated training. The detection of an attacked input sequence is performed via iterative optimization on the features from the inner layers of flow networks, without any prior knowledge of the attacks. The novel spatially constrained optimization ensures that the detected anomalous subset of features comes from a local region. To this end, SADL provides a subset of nodes within a spatial neighborhood that contribute more to the detection, which will be utilized to localize the attack in the input sequence. The proposed SADL is validated across multiple datasets and flow networks. With patch attacks 4.8% of the size of the input image resolution on RAFT, our method successfully detects and localizes them with an average precision of 0.946 and 0.951 for KITTI-2015 and MPI-Sintel datasets, respectively. The results show that SADL consistently achieves higher detection rates than existing methods and provides new localization capabilities.

List of keywords

Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods

3364

Fast Algorithms for SAT with Bounded Occurrences of Variables

Junqiang Peng, Mingyu Xiao

[+] More

[-] Less

We present fast algorithms for the general CNF satisfiability problem (SAT) with running-time bound $O^*({c_{d}}^{n})$, where $c_d$ is a function of the average occurrence $d$ of variables, and $n$ is the number of variables in the input formula. Similar to SAT with bounded clause lengths, SAT with bounded occurrences of variables has also been extensively studied in the literature. Especially, the running-time bounds for small $d$, say $d=3$ and $4$, have become the bottlenecks for algorithms evaluated by the formula length $L$ and other algorithms. In this paper, we show that SAT can be solved in time $O^*(1.1238^n)$ for $d=3$ and $O^*(1.2628^n)$ for $d=4$, respectively improving the previous results $O^*(1.1279^n)$ and $O^*(1.2721^n)$ obtained by Wahlstr\"{o}m (SAT 2005) nearly 20 years ago. For $d\geq 5$, we obtain the running time bound $O^*(1.0641^{dn})$, which implies the bound $O^*(1.0641^{L})$ with respective to the formula length $L$. This result is also competitive with the previous result $O^*(1.0646^{L})$ by Peng and Xiao (SAT 2021).

List of keywords

Constraint Satisfaction and Optimization -> CSO: Satisfiabilty

3365

Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents?

Ridhima Bector, Hang Xu, Abhay M S Aradhya, Chai Quek, Zinovi Rabinovich

[+] More

[-] Less

As Reinforcement Learning (RL) solutions are becoming ubiquitous, so is the study of potential threats to their training and deployment. While single-learner training-time attacks, capable of "pre-programming" behavioral triggers into a strategy, receive increasing attention, attacks on collections of learning agents have been largely overlooked. We remedy the situation by developing a constructive training-time attack on a population of learning agents and make the attack agnostic to the size of the population. The attack constitutes a sequence of environment (re)parameterizations (poisonings), generated to overcome individual differences between agents and lead the entire population to the same target behavior while minimizing effective environment modulation. Our method is demonstrated on populations of independent learners in "ghost" environments (learners do not interact or perceive each other) as well as environments with mutual awareness, with or without individual learning. From the attack perspective, we pursue an ultra-blackbox setting, i.e., cross-policy traces of the victim learners are the only input both for attack conditioning {\it and} attack evaluation during the attacker’s training. To manage the resulting uncertainty in population behavior, we deploy a novel Wasserstein distance-based Gaussian embedding of detected behaviors within the population of victim learners. To align with prior works on environment poisoning, our experiments are based on a 3D Grid World domain and show: a) feasibility, i.e., despite the uncertainty, the attack forces a population-wide adoption of target behavior; b) efficacy, i.e., the attack is size-agnostic and transferable.

List of keywords

Machine Learning -> ML: Adversarial machine learning
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Deep reinforcement learning

3369

Hierarchical Prompt Learning for Compositional Zero-Shot Recognition

Henan Wang, Muli Yang, Kun Wei, Cheng Deng

[+] More

[-] Less

Compositional Zero-Shot Learning (CZSL) aims to imitate the powerful generalization ability of human beings to recognize novel compositions of known primitive concepts that correspond to a state and an object, e.g., purple apple. To fully capture the intra- and inter-class correlations between compositional concepts, in this paper, we propose to learn them in a hierarchical manner. Specifically, we set up three hierarchical embedding spaces that respectively model the states, the objects, and their compositions, which serve as three “experts” that can be combined in inference for more accurate predictions. We achieve this based on the recent success of large-scale pretrained vision-language models, e.g., CLIP, which provides a strong initial knowledge of image-text relationships. To better adapt this knowledge to CZSL, we propose to learn three hierarchical prompts by explicitly fixing the unrelated word tokens in the three embedding spaces. Despite its simplicity, our proposed method consistently yields superior performance over current state-of-the-art approaches on three widely-used CZSL benchmarks.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

3378

The Effects of AI Biases and Explanations on Human Decision Fairness: A Case Study of Bidding in Rental Housing Markets

Xinru Wang, Chen Liang, Ming Yin

[+] More

[-] Less

The use of AI-based decision aids in diverse domains has inspired many empirical investigations into how AI models’ decision recommendations impact humans’ decision accuracy in AI-assisted decision making, while explorations on the impacts on humans’ decision fairness are largely lacking despite its clear importance. In this paper, using a real-world business decision making scenario—bidding in rental housing markets—as our testbed, we present an experimental study on understanding how the bias level of the AI-based decision aid as well as the provision of AI explanations affect the fairness level of humans’ decisions, both during and after their usage of the decision aid. Our results suggest that when people are assisted by an AI-based decision aid, both the higher level of racial biases the decision aid exhibits and surprisingly, the presence of AI explanations, result in more unfair human decisions across racial groups. Moreover, these impacts are partly made through triggering humans’ “disparate interactions” with AI. However, regardless of the AI bias level and the presence of AI explanations, when people return to make independent decisions after their usage of the AI-based decision aid, their decisions no longer exhibit significant unfairness across racial groups.

List of keywords

Humans and AI -> HAI: Human-AI collaboration
Humans and AI -> HAI: Human-computer interaction

3386

Statistically Significant Concept-based Explanation of Image Classifiers via Model Knockoffs

Kaiwen Xu, Kazuto Fukuchi, Youhei Akimoto, Jun Sakuma

[+] More

[-] Less

A concept-based classifier can explain the decision process of a deep learning model by human understandable concepts in image classification problems. However, sometimes concept-based explanations may cause false positives, which misregards unrelated concepts as important for the prediction task. Our goal is to find the statistically significant concept for classification to prevent misinterpretation. In this study, we propose a method using a deep learning model to learn the image concept and then using the knockoff sample to select the important concepts for prediction by controlling the False Discovery Rate (FDR) under a certain value. We evaluate the proposed method in our experiments on both synthetic and real data. Also, it shows that our method can control the FDR properly while selecting highly interpretable concepts to improve the trustworthiness of the model.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Explainable/Interpretable machine learning

3395

OptIForest: Optimal Isolation Forest for Anomaly Detection

Haolong Xiang, Hongsheng Hu, Xiaolong Xu, Lianyong Qi, Wanchun Dou, Mark Dras, Amin Beheshti, Xuyun Zhang

[+] More

[-] Less

Anomaly detection plays an increasingly important role in various fields for critical tasks such as intrusion detection in cybersecurity, financial risk detection, and human health monitoring. A variety of anomaly detection methods have been proposed, and a category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency, e.g., iForest is often employed as a state-of-the-art detector for real deployment. While the majority of isolation forests use the binary structure, a framework LSHiForest has demonstrated that the multi-fork isolation tree structure can lead to better detection performance. However, there is no theoretical work answering the fundamentally and practically important question on the optimal tree structure for an isolation forest with respect to the branching factor. In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree. Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash which enables more information to be learned from data for better isolation quality. The rationale of our approach relies on a better bias-variance trade-off achieved by bias reduction in OptIForest. Extensive experiments on a series of benchmarking datasets for comparative and ablation studies demonstrate that our approach can efficiently and robustly achieve better detection performance in general than the state-of-the-arts including the deep learning based methods.

List of keywords

Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Ensemble methods

3398

Action Space Reduction for Planning Domains

Harsha Kokel, Junkyu Lee, Michael Katz, Kavitha Srinivas, Shirin Sohrabi

[+] More

[-] Less

Planning tasks succinctly represent labeled transition systems, with each ground action corresponding to a label. This granularity, however, is not necessary for solving planning tasks and can be harmful, especially for model-free methods. In order to apply such methods, the label sets are often manually reduced. In this work, we propose automating this manual process. We characterize a valid label reduction for classical planning tasks and propose an automated way of obtaining such valid reductions by leveraging lifted mutex groups. Our experiments show a significant reduction in the action label space size across a wide collection of planning domains. We demonstrate the benefit of our automated label reduction in two separate use cases: improved sample complexity of model-free reinforcement learning algorithms and speeding up successor generation in lifted planning.

List of keywords

Planning and Scheduling -> PS: Theoretical foundations of planning

3414

Dual Personalization on Federated Recommendation

Chunxu Zhang, Guodong Long, Tianyi Zhou, Peng Yan, Zijian Zhang, Chengqi Zhang, Bo Yang

[+] More

[-] Less

Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of recommender systems in federated settings. The code is available.

List of keywords

Machine Learning -> ML: Federated learning
Data Mining -> DM: Privacy-preserving data mining
Data Mining -> DM: Recommender systems

3422

Simulation-Assisted Optimization for Large-Scale Evacuation Planning with Congestion-Dependent Delays

Kazi Ashik Islam, Da Qi Chen, Madhav Marathe, Henning Mortveit, Samarth Swarup, Anil Vullikanti

[+] More

[-] Less

Evacuation planning is a crucial part of disaster management where the goal is to relocate people to safety and minimize casualties. However, joint optimization of its two essential components, routing and scheduling, with objectives such as minimizing average evacuation time or evacuation completion time, is a computationally hard problem. To approach it, we present MIP-LNS, a scalable optimization method that utilizes heuristic search with mathematical optimization and can optimize a variety of objective functions. We also present the method MIP-LNS-SIM, where we combine agent-based simulation with MIP-LNS to estimate delays due to congestion, as well as, find optimized plans considering such delays. We use Harris County in Houston, Texas, as our study area. We show that, within a given time limit, MIP-LNS finds better solutions than existing methods in terms of average evacuation time, evacuation completion time, and optimality guarantee of the solutions (13%, 21%, 58% improvement respectively). We then show that MIP-LNS-SIM outperforms MIP-LNS in terms of average evacuation time, evacuation completion time, and average time spent on the road (10%, 17%, 77% improvement respectively) when delay due to congestion is considered. In addition, MIP-LNS-SIM has a significantly lower percent error in estimated evacuation completion time (6%) compared to MIP-LNS (76%).

List of keywords

Planning and Scheduling -> PS: Search in planning and scheduling
Agent-based and Multi-agent Systems -> MAS: Applications
Search -> S: Heuristic search

3423

New Algorithms for the Fair and Efficient Allocation of Indivisible Chores

Jugal Garg, Aniket Murhekar, John Qin

[+] More

[-] Less

We study the problem of fairly and efficiently allocating indivisible chores among agents with additive disutility functions. We consider the widely used envy-based fairness properties of EF1 and EFX in conjunction with the efficiency property of fractional Pareto-optimality (fPO). Existence (and computation) of an allocation that is simultaneously EF1/EFX and fPO are challenging open problems, and we make progress on both of them. We show the existence of an allocation that is – EF1 + fPO, when there are three agents, – EF1 + fPO, when there are at most two disutility functions, – EFX + fPO, for three agents with bivalued disutility functions. These results are constructive, based on strongly polynomial-time algorithms. We also investigate non-existence and show that an allocation that is EFX+fPO need not exist, even for two agents.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Computational social choice

3434

pTSE: A Multi-model Ensemble Method for Probabilistic Time Series Forecasting

Yunyi Zhou, Zhixuan Chu, Yijia Ruan, Ge Jin, yuchen huang, Sheng Li

[+] More

[-] Less

Various probabilistic time series forecasting models have sprung up and shown remarkably good performance. However, the choice of model highly relies on the characteristics of the input time series and the fixed distribution that model is based on. Due to the fact that the probability distributions cannot be averaged over different models straightforwardly, the current time series model ensemble methods cannot be directly applied to improve the robustness and accuracy of forecasting. To address this issue, we propose pTSE, a multi-model distribution ensemble method for probabilistic forecasting based on Hidden Markov Model (HMM). pTSE only takes off-the-shelf outputs from member models without requiring further information about each model. Besides, we provide a complete theoretical analysis of pTSE to prove that the empirical distribution of time series subject to an HMM will converge to the stationary distribution almost surely. Experiments on benchmarks show the superiority of pTSE over all member models and competitive ensemble methods.

List of keywords

Machine Learning -> ML: Probabilistic machine learning
Machine Learning -> ML: Time series and data streams

3440

DiSProD: Differentiable Symbolic Propagation of Distributions for Planning

Palash Chatterjee, Ashutosh Chapagain, Weizhe Chen, Roni Khardon

[+] More

[-] Less

The paper introduces DiSProD, an online planner developed for environments with probabilistic transitions in continuous state and action spaces. DiSProD builds a symbolic graph that captures the distribution of future trajectories, conditioned on a given policy, using independence assumptions and approximate propagation of distributions. The symbolic graph provides a differentiable representation of the policy’s value, enabling efficient gradient-based optimization for long-horizon search. The propagation of approximate distributions can be seen as an aggregation of many trajectories, making it well-suited for dealing with sparse rewards and stochastic environments. An extensive experimental evaluation compares DiSProD to state-of-the-art planners in discrete-time planning and real-time control of robotic systems. The proposed method improves over existing planners in handling stochastic environments, sensitivity to search depth, sparsity of rewards, and large action spaces. Additional real-world experiments demonstrate that DiSProD can control ground vehicles and surface vessels to successfully navigate around obstacles.

List of keywords

Planning and Scheduling -> PS: Planning under uncertainty
Planning and Scheduling -> PS: Planning algorithms
Planning and Scheduling -> PS: Robot planning

3449

K∗ Search Over Orbit Space for Top-k Planning

Michael Katz, Junkyu Lee

[+] More

[-] Less

Top-k planning is a key formalism for many planning applications. K* search is a well-established approach to top-k planning. The algorithm iteratively runs A* search and Eppstein’s algorithm until a sufficient number of plans is found. The performance of K* algorithm is therefore inherently limited by the performance of A*, and in order to improve K* performance, that of A* must be improved. In cost-optimal planning, orbit space search improves A* performance, essentially performing A* in the orbit space instead of state space. In this work, we take a similar approach to top-k planning. We show theoretic equivalence between the goal paths in the state space and in the orbit space, allowing to perform K* search in the orbit space instead, reconstructing plans from the found paths. We prove that our algorithm is sound and complete for top-k planning and empirically show it to achieve state-of-the-art performance, overtaking all existing to date top-k planners.

List of keywords

Planning and Scheduling -> PS: Search in planning and scheduling
Planning and Scheduling -> PS: Planning algorithms
Planning and Scheduling -> PS: Theoretical foundations of planning

3475

Do We Need an Encoder-Decoder to Model Dynamical Systems on Networks?

Bing Liu, Wei Luo, Gang Li, Jing Huang, Bo Yang

[+] More

[-] Less

As deep learning gains popularity in modelling dynamical systems, we expose an underappreciated misunderstanding relevant to modelling dynamics on networks. Strongly influenced by graph neural networks, latent vertex embeddings are naturally adopted in many neural dynamical network models. However, we show that embeddings tend to induce a model that fits observations well but simultaneously has incorrect dynamical behaviours. Recognising that previous studies narrowly focus on short-term predictions during the transient phase of a flow, we propose three tests for correct long-term behaviour, and illustrate how an embedding-based dynamical model fails these tests, and analyse the causes, particularly through the lens of topological conjugacy. In doing so, we show that the difficulties can be avoided by not using embedding. We propose a simple embedding-free alternative based on parametrising two additive vector-field components. Through extensive experiments, we verify that the proposed model can reliably recover a broad class of dynamics on different network topologies from time series data.

List of keywords

Data Mining -> DM: Networks
Machine Learning -> ML: Time series and data streams

3478

RaMLP: Vision MLP via Region-aware Mixing

Shenqi Lai, Xi Du, Jia Guo, Kaipeng Zhang

[+] More

[-] Less

Recently, MLP-based architectures achieved impressive results in image classification against CNNs and ViTs. However, there is an obvious limitation in that their parameters are related to image sizes, allowing them to process only fixed image sizes. Therefore, they cannot directly adapt dense prediction tasks (e.g., object detection and semantic segmentation) where images are of various sizes. Recent methods tried to address it but brought two new problems, long-range dependencies or important visual cues are ignored. This paper presents a new MLP-based architecture, Region-aware MLP (RaMLP), to satisfy various vision tasks and address the above three problems. In particular, we propose a well-designed module, Region-aware Mixing (RaM). RaM captures important local information and further aggregates these important visual clues. Based on RaM, RaMLP achieves a global receptive field even in one block. It is worth noting that, unlike most existing MLP-based architectures that adopt the same spatial weights to all samples, RaM is region-aware and adaptively determines weights to extract region-level features better. Impressively, our RaMLP outperforms state-of-the-art ViTs, CNNs, and MLPs on both ImageNet-1K image classification and downstream dense prediction tasks, including MS-COCO object detection, MS-COCO instance segmentation, and ADE20K semantic segmentation. In particular, RaMLP outperforms MLPs by a large margin (around 1.5% Apb or 1.0% mIoU) on dense prediction tasks.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning

3482

Social Motivation for Modelling Other Agents under Partial Observability in Decentralised Training

Dung Nguyen, Hung Le, Kien Do, Svetha Venkatesh, Truyen Tran

[+] More

[-] Less

Understanding other agents is a key challenge in constructing artificial social agents. Current works focus on centralised training, wherein agents are allowed to know all the information about others and the environmental state during training. In contrast, this work studies decentralised training, wherein agents must learn the model of other agents in order to cooperate with them under partially-observable conditions, even during training, i.e. learning agents are myopic. The intrinsic motivation for artificial agents is modelled on the concept of human social motivation that entices humans to meet and understand each other, especially when experiencing a utility loss. Our intrinsic motivation encourages agents to stay near each other to obtain better observations and construct a model of others. They do so when their model of other agents is poor, or the overall task performance is bad during the learning phase. This simple but effective method facilitates the processes of modelling others, resulting in an improvement of the performance in cooperative tasks significantly. Our experiments demonstrate that the socially-motivated agent can model others better and promote cooperation across different tasks.

List of keywords

Machine Learning -> ML: Deep reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Other

3497

JEPOO: Highly Accurate Joint Estimation of Pitch, Onset and Offset for Music Information Retrieval

Haojie Wei, Yuan Jun, Rui Zhang, Yueguo Chen, Gang Wang

[+] More

[-] Less

Melody extraction is a core task in music information retrieval, and the estimation of pitch, onset and offset are key sub-tasks in melody extraction. Existing methods have limited accuracy, and work for only one type of data, either single-pitch or multi-pitch. In this paper, we propose a highly accurate method for joint estimation of pitch, onset and offset, named JEPOO. We address the challenges of joint learning optimization and handling both single-pitch and multi-pitch data through novel model design and a new optimization technique named Pareto modulated loss with loss weight regularization. This is the first method that can accurately handle both single-pitch and multi-pitch music data, and even a mix of them. A comprehensive experimental study on a wide range of real datasets shows that JEPOO outperforms state-of-the-art methods by up to 10.6%, 8.3% and 10.3% for the prediction of Pitch, Onset and Offset, respectively, and JEPOO is robust for various types of data and instruments. The ablation study validates the effectiveness of each component of JEPOO

List of keywords

Multidisciplinary Topics and Applications -> MDA: Arts and creativity
Multidisciplinary Topics and Applications -> MDA: Entertainment
Multidisciplinary Topics and Applications -> MDA: Other

3510

Boosting Few-Shot Open-Set Recognition with Multi-Relation Margin Loss

Yongjuan Che, Yuexuan An, Hui Xue

[+] More

[-] Less

Few-shot open-set recognition (FSOSR) has become a great challenge, which requires classifying known classes and rejecting the unknown ones with only limited samples. Existing FSOSR methods mainly construct an ambiguous distribution of known classes from scarce known samples without considering the latent distribution information of unknowns, which degrades the performance of open-set recognition. To address this issue, we propose a novel loss function called multi-relation margin (MRM) loss that can plug in few-shot methods to boost the performance of FSOSR. MRM enlarges the margin between different classes by extracting the multi-relationship of paired samples to dynamically refine the decision boundary for known classes and implicitly delineate the distribution of unknowns. Specifically, MRM separates the classes by enforcing a margin while concentrating samples of the same class on a hypersphere with a learnable radius. In order to better capture the distribution information of each class, MRM extracts the similarity and correlations among paired samples, ameliorating the optimization of the margin and radius. Experiments on public benchmarks reveal that methods with MRM loss can improve the unknown detection of AUROC by a significant margin while correctly classifying the known classes.

List of keywords

Machine Learning -> ML: Meta-learning
Machine Learning -> ML: Few-shot learning

3525

SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference

Boren Hu, Yun Zhu, Jiacheng Li, Siliang Tang

[+] More

[-] Less

Dynamic early exiting has been proven to improve the inference speed of the pre-trained language model like BERT. However, all samples must go through all consecutive layers before early exiting and more complex samples usually go through more layers, which still exists redundant computation. In this paper, we propose a novel dynamic early exiting combined with layer skipping for BERT inference named SmartBERT, which adds a skipping gate and an exiting operator into each layer of BERT. SmartBERT can adaptively skip some layers and adaptively choose whether to exit. Besides, we propose cross-layer contrastive learning and combine it into our training phases to boost the intermediate layers and classifiers which would be beneficial for early exiting. To keep the inconsistent usage of skipping gates between training and inference phases, we propose a hard weight mechanism during training phase. We conduct experiments on eight classification datasets of the GLUE benchmark. Experimental results show that SmartBERT achieves 2-3× computation reduction with minimal accuracy drops compared with BERT and our method outperforms previous methods in both efficiency and accuracy. Moreover, in some complex datasets, we prove that the early exiting based on entropy hardly works, and the skipping mechanism is essential for reducing computation.

List of keywords

Natural Language Processing -> NLP: Language models
Natural Language Processing -> NLP: Text classification

3526

Deep Hashing-based Dynamic Stock Correlation Estimation via Normalizing Flow

Xiaolin Zheng, Mengpu Liu, Mengying Zhu

[+] More

[-] Less

In financial scenarios, influenced by common factors such as global macroeconomic and sector-specific factors, stocks exhibit varying degrees of correlations with each other, which is essential in risk-averse portfolio allocation. Because the real risk matrix is unobservable, the covariance-based correlation matrix is widely used for constructing diversified stock. However, seldom studies focus on dynamic correlation matrix estimation under the non-stationary financial market. Moreover, as the number of stocks growing, the training process of existing correlation matrix estimation methods becomes significantly complicated and slow. In this paper, we propose a novel hash-based dynamic correlation forecasting model (HDCF) to estimation the dynamic stock correlation. Under a structural assumption of sparsity and slow-varying evolving, HDCF learns the hash representation of the correlation matrix, which performs extremely efficiently in high-dimensional settings. Experiments show that our proposed model outperforms baselines on portfolio decisions.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Finance
Machine Learning -> ML: Representation learning

3531

Guide to Control: Offline Hierarchical Reinforcement Learning using Subgoal Generation for Long-Horizon and Complex Tasks

Wonchul Shin, Yusung Kim

[+] More

[-] Less

Reinforcement learning (RL) has achieved considerable success in many fields, but applying it to real-world problems can be costly and risky because it requires a lot of online interaction. Recently, offline RL has shown the possibility of extracting a solution through existing logged data without online interaction. In this work, we propose an offline hierarchical RL method, Guider (Guide to Control), that can efficiently solve long-horizon and complex tasks from offline data. The high-level policy sequentially generates a subgoal that can guide the agent to arrive at the final goal, and the lower-level policy learns how to reach each given guided subgoal. In the process of learning from offline data, the key is to make the low-level policy reachable to the generated subgoals. We show that high-quality subgoal generation is possible through pre-training a latent subgoal prior model. The well-regulated subgoal generation improves performance while avoiding distributional shifts in offline RL by breaking down long, complex tasks into shorter, easier ones. For evaluations, Guider outperforms prior offline RL methods in long-horizon robot navigation and complex manipulation benchmarks. Our code is available at "this".

List of keywords

Machine Learning -> ML: Deep reinforcement learning
Planning and Scheduling -> PS: Learning in planning and scheduling
Robotics -> ROB: Behavior and control

3535

Distilling Universal and Joint Knowledge for Cross-Domain Model Compression on Time Series Data

Qing Xu, Min Wu, Xiaoli Li, Kezhi Mao, Zhenghua Chen

[+] More

[-] Less

For many real-world time series tasks, the computational complexity of prevalent deep leaning models often hinders the deployment on resource limited environments (e.g., smartphones). Moreover, due to the inevitable domain shift between model training (source) and deploying (target) stages, compressing those deep models under cross-domain scenarios becomes more challenging. Although some of existing works have already explored cross-domain knowledge distillation for model compression, they are either biased to source data or heavily tangled between source and target data. To this end, we design a novel end-to-end framework called UNiversal and joInt Knowledge Distillation (UNI-KD) for cross-domain model compression. In particular, we propose to transfer both the universal feature-level knowledge across source and target domains and the joint logit-level knowledge shared by both domains from the teacher to the student model via an adversarial learning scheme. More specifically, a feature-domain discriminator is employed to align teacher’s and student’s representations for universal knowledge transfer. A data-domain discriminator is utilized to prioritize the domain-shared samples for joint knowledge transfer. Extensive experimental results on four time series datasets demonstrate the superiority of our proposed method over state-of-the-art (SOTA) benchmarks. The source code is available at https://github.com/ijcai2023/UNI KD.

List of keywords

Machine Learning -> ML: Multi-task and transfer learning
Machine Learning -> ML: Time series and data streams
Machine Learning -> ML: Unsupervised learning

3540

FedNoRo: Towards Noise-Robust Federated Learning by Addressing Class Imbalance and Label Noise Heterogeneity

Nannan Wu, Li Yu, Xuefeng Jiang, Kwang-Ting Cheng, Zengqiang Yan

[+] More

[-] Less

Federated noisy label learning (FNLL) is emerging as a promising tool for privacy-preserving multi-source decentralized learning. Existing research, relying on the assumption of class-balanced global data, might be incapable to model complicated label noise, especially in medical scenarios. In this paper, we first formulate a new and more realistic federated label noise problem where global data is class-imbalanced and label noise is heterogeneous, and then propose a two-stage framework named FedNoRo for noise-robust federated learning. Specifically, in the first stage of FedNoRo, per-class loss indicators followed by Gaussian Mixture Model are deployed for noisy client identification. In the second stage, knowledge distillation and a distance-aware aggregation function are jointly adopted for noise-robust federated model updating. Experimental results on the widely-used ICH and ISIC2019 datasets demonstrate the superiority of FedNoRo against the state-of-the-art FNLL methods for addressing class imbalance and label noise heterogeneity in real-world FL scenarios.

List of keywords

Machine Learning -> ML: Federated learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Robustness

3545

Graph Neural Convection-Diffusion with Heterophily

KAI ZHAO, Qiyu Kang, Yang Song, Rui She, Sijie Wang, Wee Peng Tay

[+] More

[-] Less

Graph neural networks (GNNs) have shown promising results across various graph learning tasks, but they often assume homophily, which can result in poor performance on heterophilic graphs. The connected nodes are likely to be from different classes or have dissimilar features on heterophilic graphs. In this paper, we propose a novel GNN that incorporates the principle of heterophily by modeling the flow of information on nodes using the convection-diffusion equation (CDE). This allows the CDE to take into account both the diffusion of information due to homophily and the “convection” of information due to heterophily. We conduct extensive experiments, which suggest that our framework can achieve competitive performance on node classification tasks for heterophilic graphs compared to the state-of-the-art methods.

List of keywords

Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Classification

3548

Analyzing and Combating Attribute Bias for Face Restoration

Zelin Li, Dan Zeng, Xiao Yan, Qiaomu Shen, Bo Tang

[+] More

[-] Less

Face restoration (FR) recovers high resolution (HR) faces from low resolution (LR) faces and is challenging due to its ill-posed nature. With years of development, existing methods can produce quality HR faces with realistic details. However, we observe that key facial attributes (e.g., age and gender) of the restored faces could be dramatically different from the LR faces and call this phenomenon attribute bias, which is fatal when using FR for applications such as surveillance and security. Thus, we argue that FR should consider not only image quality as in existing works but also attribute bias. To this end, we thoroughly analyze attribute bias with extensive experiments and find that two major causes are the lack of attribute information in LR faces and bias in the training data. Moreover, we propose the DebiasFR framework to produce HR faces with high image quality and accurate facial attributes. The key design is to explicitly model the facial attributes, which also allows to adjust facial attributes for the output HR faces. Experiment results show that DebiasFR has comparable image quality but significantly smaller attribute bias when compared with state-of-the-art FR methods.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Bias, fairness and privacy
Computer Vision -> CV: Neural generative models, auto encoders, GANs

3556

Not Only Pairwise Relationships: Fine-Grained Relational Modeling for Multivariate Time Series Forecasting

Jinming Wu, Qi Qi, Jingyu Wang, Haifeng Sun, Zhikang Wu, Zirui Zhuang, Jianxin Liao

[+] More

[-] Less

Recent graph-based methods achieve significant success in multivariate time series modeling and forecasting due to their ability to handle relationships among time series variables. However, only pairwise relationships are considered in most existing works. They ignore beyond-pairwise relationships and their potential categories in practical scenarios, which leads to incomprehensive relationship learning for multivariate time series forecasting. In this paper, we present ReMo, a Relational Modeling-based method, to promote fine-grained relational learning among multivariate time series data. Firstly, by treating time series variables and complex relationships as nodes and hyperedges, we extract multi-view hypergraphs from data to capture beyond-pairwise relationships. Secondly, a novel hypergraph message passing strategy is designed to characterize both nodes and hyperedges by inferring the potential categories of relationships and further distinguishing their impacts on time series variables. By integrating these two modules into the time series forecasting framework, ReMo effectively improves the performance of multivariate time series forecasting. The experimental results on seven commonly used datasets from different domains demonstrate the superiority of our model.

List of keywords

Machine Learning -> ML: Time series and data streams
Data Mining -> DM: Mining graphs
Data Mining -> DM: Mining spatial and/or temporal data

3562

Acoustic NLOS Imaging with Cross Modal Knowledge Distillation

Ui-Hyeon Shin, Seungwoo Jang, Kwang-su Kim

[+] More

[-] Less

Acoustic non-line-of-sight (NLOS) imaging aims to reconstruct hidden scenes by analyzing reflections of acoustic waves. Despite recent developments in the field, existing methods still have limitations such as sensitivity to noise in a physical model and difficulty in reconstructing unseen objects in a deep learning model. To address these limitations, we propose a novel cross-modal knowledge distillation (CMKD) approach for acoustic NLOS imaging. Our method transfers knowledge from a well-trained image network to an audio network, effectively combining the strengths of both modalities. As a result, it is robust to noise and superior in reconstructing unseen objects. Additionally, we evaluate real-world datasets and demonstrate that the proposed method outperforms state-of-the-art methods in acoustic NLOS imaging. The experimental results indicate that CMKD is an effective solution for addressing the limitations of current acoustic NLOS imaging methods.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Neural generative models, auto encoders, GANs
Machine Learning -> ML: Multi-modal learning

3566

GLPocket: A Multi-Scale Representation Learning Approach for Protein Binding Site Prediction

Peiying Li, Yongchang Liu, Shikui Tu, Lei Xu

[+] More

[-] Less

Protein binding site prediction is an important prerequisite for the discovery of new drugs. Usually, natural 3D U-Net is adopted as the standard site prediction framework to do per-voxel binary mask classification. However, this scheme only performs feature extraction for single-scale samples, which may bring the loss of global or local information, resulting in incomplete, artifacted or even missed predictions. To tackle this issue, we propose a network called GLPocket, which is based on the 3D U-Net structure and utilizes multi-scale representation to predict binding sites. Firstly, GLPocket uses Target Cropping Block (TCB) for targeted prediction. TCB selects the local interested representation from the global representations to perform concentrated prediction, and reduces the calculation amount by $82\%$. It integrates global distribution information into local regions, making prediction more concentrated in decoding stage. Secondly, GLPocket establishes long-range relationship of patches within the local region with Transformer Block (TB), to enrich local context semantic information. Experiments show that GLPocket improves by $0.5\%-4\%$ on DCA Top-$n$ prediction compared with previous state-of-the-art methods on four benchmark data sets. We will publish source code in GitHub after the article is accepted.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Bioinformatics
Computer Vision -> CV: Biomedical image analysis

3572

DeepPSL: End-to-end perception and reasoning

Sridhar Dasaratha, Sai Akhil Puranam, Karmvir Phogat, Sunil Tiyyagura, Nigel Duffy

[+] More

[-] Less

We introduce DeepPSL a variant of probabilistic soft logic (PSL) to produce an end-to-end trainable system that integrates reasoning and perception. PSL represents first-order logic in terms of a convex graphical model – hinge-loss Markov random fields (HL-MRFs). PSL stands out among probabilistic logic frameworks due to its tractability having been applied to systems of more than 1 billion ground rules. The key to our approach is to represent predicates in first-order logic using deep neural networks and then to approximately back-propagate through the HL-MRF and thus train every aspect of the first-order system being represented. We believe that this approach represents an interesting direction for the integration of deep learning and reasoning techniques with applications to knowledge base learning, multi-task learning, and explainability. Evaluation on three different tasks demonstrates that DeepPSL significantly outperforms state-of-the-art neuro-symbolic methods on scalability while achieving comparable or better accuracy.

List of keywords

Machine Learning -> ML: Neuro-symbolic methods
Machine Learning -> ML: Knowledge-aided learning
Machine Learning -> ML: Learning graphical models

3573

Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance

Zheyu Zhang, Tianping Zhang, Jian Li

[+] More

[-] Less

Gradient Boosting Decision Tree (GBDT) has achieved remarkable success in a wide variety of applications. The split finding algorithm, which determines the tree construction process, is one of the most crucial components of GBDT. However, the split finding algorithm has long been criticized for its bias towards features with a large number of potential splits. This bias introduces severe interpretability and overfitting issues in GBDT. To this end, we provide a fine-grained analysis of bias in GBDT and demonstrate that the bias originates from 1) the systematic bias in the gain estimation of each split and 2) the bias in the split finding algorithm resulting from the use of the same data to evaluate the split improvement and determine the best split. Based on the analysis, we propose unbiased gain, a new unbiased measurement of gain importance using out-of-bag samples. Moreover, we incorporate the unbiased property into the split finding algorithm and develop UnbiasedGBM to solve the overfitting issue of GBDT. We assess the performance of UnbiasedGBM and unbiased gain in a large-scale empirical study comprising 60 datasets and show that: 1) UnbiasedGBM exhibits better performance than popular GBDT implementations such as LightGBM, XGBoost, and Catboost on average on the 60 datasets and 2) unbiased gain achieves better average performance in feature selection than popular feature importance methods.

List of keywords

Machine Learning -> ML: Applications
Machine Learning -> ML: Classification

3636

Semi-supervised Domain Adaptation in Graph Transfer Learning

Ziyue Qiao, Xiao Luo, Meng Xiao, Hao Dong, Yuanchun Zhou, Hui Xiong

[+] More

[-] Less

As a specific case of graph transfer learning, unsupervised domain adaptation on graphs aims for knowledge transfer from label-rich source graphs to unlabeled target graphs. However, graphs with topology and attributes usually have considerable cross-domain disparity and there are numerous real-world scenarios where merely a subset of nodes are labeled in the source graph. This imposes critical challenges on graph transfer learning due to serious domain shifts and label scarcity. To address these challenges, we propose a method named Semi-supervised Graph Domain Adaptation (SGDA). To deal with the domain shift, we add adaptive shift parameters to each of the source nodes, which are trained in an adversarial manner to align the cross-domain distributions of node embedding. Thus, the node classifier trained on labeled source nodes can be transferred to the target nodes. Moreover, to address the label scarcity, we propose pseudo-labeling on unlabeled nodes, which improves classification on the target graph via measuring the posterior influence of nodes based on their relative position to the class centroids. Finally, extensive experiments on a range of publicly accessible datasets validate the effectiveness of our proposed SGDA in different experimental settings.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Multi-task and transfer learning
Machine Learning -> ML: Semi-supervised learning

3639

Don’t Ignore Alienation and Marginalization: Correlating Fraud Detection

Yilong Zang, Ruimin Hu, Zheng Wang, Xu Danni, Jia Wu, Dengshi Li, Junhang Wu, Lingfei Ren

[+] More

[-] Less

The anonymity of online networks makes tackling fraud increasingly costly. Thanks to the superiority of graph representation learning, graph-based fraud detection has made significant progress in recent years. However, upgrading fraudulent strategies produces more advanced and difficult scams. One common strategy is synergistic camouflage —— combining multiple means to deceive others. Existing methods mostly investigate the differences between relations on individual frauds, that neglect the correlation among multi-relation fraudulent behaviors. In this paper, we design several statistics to validate the existence of synergistic camouflage of fraudsters by exploring the correlation among multi-relation interactions. From the perspective of multi-relation, we find two distinctive features of fraudulent behaviors, \textit{i.e.}, alienation and marginalization. Based on the finding, we propose COFRAUD, a correlation-aware fraud detection model, which innovatively incorporates synergistic camouflage into fraud detection. It captures the correlation among multi-relation fraudulent behaviors. Experimental results on two public datasets demonstrate that COFRAUD achieves significant improvements over state-of-the-art methods.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Security and privacy
Data Mining -> DM: Applications

3654

Dual Prompt Learning for Continual Rain Removal from Single Images

Minghao Liu, Wenhan Yang, Yuzhang Hu, Jiaying Liu

[+] More

[-] Less

Recent efforts have achieved remarkable progress on single image deraining on the stationary distributed data. However, catastrophic forgetting raises practical concerns when applying these methods to real applications, where the data distributions change constantly. In this paper, we investigate the continual learning issue for rain removal and develop a novel efficient continual learned deraining transformer. Different from the typical replay or regularization-based methods that increase overall training time or parameter space, our method relies on compact prompts which are learnable parameters, to maintain both task-invariant and task-specific knowledge. Our prompts are applied at both image and feature levels to leverage effectively transferred knowledge of images and features among different tasks. We conduct comprehensive experiments under widely-used rain removal datasets, where our proposed dual prompt learning consistently outperforms prior state-of-the-art methods. Moreover, we observe that, even though our method is designed for continual learning, it still achieves superior results on the stationary distributed data, which further demonstrates the effectiveness of our method.

List of keywords

Computer Vision -> CV: Computational photography

3655

Open-world Semi-supervised Novel Class Discovery

Jiaming Liu, Yangqiming Wang, Tongze Zhang, Yulu Fan, Qinli Yang, Junming Shao

[+] More

[-] Less

Traditional semi-supervised learning tasks assume that both labeled and unlabeled data follow the same class distribution, but the realistic open-world scenarios are of more complexity with unknown novel classes mixed in the unlabeled set. Therefore, it is of great challenge to not only recognize samples from known classes but also discover the arbitrary number of novel classes from the unlabeled data. In this paper, we introduce a new open-world semi-supervised novel class discovery approach named OpenNCD, a progressive bi-level contrastive learning method over multiple prototypes. The proposed method is composed of two reciprocally enhanced parts. First, a bi-level contrastive learning method is introduced, which maintains the pair-wise similarity of the prototypes and the prototype group levels for better representation learning. Then, a reliable prototype similarity metric is proposed based on the common representing instances. Prototypes with high similarities are grouped progressively for known class matching and novel class discovery. Extensive experiments on three image datasets are conducted and the results show the effectiveness of the proposed method in open-world scenarios, especially with fewer known classes and labels.

List of keywords

Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Self-supervised Learning

3663

Diverse Approximations for Monotone Submodular Maximization Problems with a Matroid Constraint

Anh Do, Mingyu Guo, Aneta Neumann, Frank Neumann

[+] More

[-] Less

Finding diverse solutions to optimization problems has been of practical interest for several decades, and recently enjoyed increasing attention in research. While submodular optimization has been rigorously studied in many fields, its diverse solutions extension has not. In this study, we consider the most basic variants of submodular optimization, and propose two simple greedy algorithms, which are known to be effective at maximizing monotone submodular functions. These are equipped with parameters that control the trade-off between objective and diversity. Our theoretical contribution shows their approximation guarantees in both objective value and diversity, as functions of their respective parameters. Our experimental investigation with maximum coverage instances demonstrates their empirical differences in terms of objective-diversity trade-offs.

List of keywords

Search -> S: Combinatorial search and optimisation
Search -> S: Heuristic search

3667

ActUp: Analyzing and Consolidating tSNE and UMAP

Andrew Draganov, Jakob Jørgensen, Katrine Scheel, Davide Mottin, Ira Assent, Tyrus Berry, Cigdem Aslay

[+] More

[-] Less

TSNE and UMAP are popular dimensionality reduction algorithms due to their speed and interpretable low-dimensional embeddings. Despite their popularity, however, little work has been done to study their full span of differences. We theoretically and experimentally evaluate the space of parameters in the TSNE and UMAP algorithms and observe that a single one — the normalization — is responsible for switching between them. This, in turn, implies that a majority of the algorithmic differences can be toggled without affecting the embeddings. We discuss the implications this has on several theoretic claims behind UMAP, as well as how to reconcile them with existing TSNE interpretations. Based on our analysis, we provide a method (GDR) that combines previously incompatible techniques from TSNE and UMAP and can replicate the results of either algorithm. This allows our method to incorporate further improvements, such as an acceleration that obtains either method’s outputs faster than UMAP. We release improved versions of TSNE, UMAP, and GDR that are fully plug-and-play with the traditional libraries.

List of keywords

Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
Data Mining -> DM: Data visualization
Machine Learning -> ML: Unsupervised learning

3671

An Effective and Efficient Time-aware Entity Alignment Framework via Two-aspect Three-view Label Propagation

Li Cai, Xin Mao, Youshao Xiao, Changxu Wu, Man Lan

[+] More

[-] Less

Entity alignment (EA) aims to find the equivalent entity pairs between different knowledge graphs (KGs), which is crucial to promote knowledge fusion. With the wide use of temporal knowledge graphs (TKGs), time-aware EA (TEA) methods appear to enhance EA. Existing TEA models are based on Graph Neural Networks (GNN) and achieve state-of-the-art (SOTA) performance, but it is difficult to transfer them to large-scale TKGs due to the scalability issue of GNN. In this paper, we propose an effective and efficient non-neural EA framework between TKGs, namely LightTEA, which consists of four essential components: (1) Two-aspect Three-view Label Propagation, (2) Sparse Similarity with Temporal Constraints, (3) Sinkhorn Operator, and (4) Temporal Iterative Learning. All of these modules work together to improve the performance of EA while reducing the time consumption of the model. Extensive experiments on public datasets indicate that our proposed model significantly outperforms the SOTA methods for EA between TKGs, and the time consumed by LightTEA is only dozens of seconds at most, no more than 10$\%$ of the most efficient TEA method.

List of keywords

Natural Language Processing -> NLP: Information retrieval and text mining

3675

CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo

WeiTao Chen, Hongbin Xu, Zhipeng Zhou, Yang Liu, Baigui Sun, Wenxiong Kang, Xuansong Xie

[+] More

[-] Less

The core of Multi-view Stereo (MVS) is the matching process among reference pixels and corresponding pixels on the epipolar lines of source views. Cost aggregation on cost volume plays a significant role in the matching process, while previous learning-based MVS methods focus on handling it via CNNs. This may inherit the natural limitation of CNNs that fail to discriminate repetitive or incorrect matches due to limited local receptive fields. To handle the issue, we aim to involve the long-range dependent representation of Transformer into the MVS pipeline as an alternative to the widely used 3D CNN in the cost aggregation process. However, another problem may occur due to the quadratically growing computational complexity caused by Transformer, resulting in unexpected memory overflow and inference latency. Thus, the huge variation in views may deteriorate the problem. In this paper, we overcome these limits with an efficient Transformer-based cost aggregation network, namely CostFormer. The Residual Depth-Aware Cost Transformer (RDACT) is proposed to aggregate long-range features on cost volume via self-attention mechanisms along the depth, width, and height dimensions. Furthermore, we also propose a Residual Regression Transformer (RRT) to enhance spatial attention. For efficient computation, self-attention is conducted within the local windows and then shifted to construct the global attention. The proposed method can be a universal plug-in to improve learning-based MVS methods effectively.

List of keywords

Computer Vision -> CV: 3D computer vision

3686

Incremental and Decremental Optimal Margin Distribution Learning

Li-Jun Chen, Teng Zhang, Xuanhua Shi, Hai Jin

[+] More

[-] Less

Incremental and decremental learning (IDL) deals with the tasks where new data arrives sequentially as a stream or old data turns unavailable continually due to the privacy protection. Existing IDL methods mainly focus on support vector machine and its variants with linear-type loss. There are few studies about the quadratic-type loss, whose Lagrangian multipliers are unbounded and much more difficult to track than linear ones. In this paper, we take the latest statistical learning framework optimal margin distribution machine (ODM) which involves a quadratic-type loss due to the optimization of margin variance, for example, and equip it with the ability to handle IDL tasks. Our proposed ID-ODM can get rid of the update in an infinite range by determining the optimal value beforehand, thus enjoying much more efficiency. Besides, ID-ODM is also applicable when multiple instances come and leave simultaneously. Extensive empirical studies show that ID-ODM can achieve 9.3x speedup on average with almost no generalization lost compared to retraining ODM on new data set from scratch.

List of keywords

Machine Learning -> ML: Classification
Machine Learning -> ML: Incremental learning

3697

Neural Capacitated Clustering

Jonas Falkner, Lars Schmidt-Thieme

[+] More

[-] Less

Recent work on deep clustering has found new promising methods also for constrained clustering problems. Their typically pairwise constraints of- ten can be used to guide the partitioning of the data. Many problems however, feature cluster-level con- straints, e.g. the Capacitated Clustering Problem (CCP), where each point has a weight and the total weight sum of all points in each cluster is bounded by a prescribed capacity. In this paper we propose a new method for the CCP, Neural Capacited Clus- tering, that learns a neural network to predict the assignment probabilities of points to cluster centers from a data set of optimal or near optimal past solu- tions of other problem instances. During inference, the resulting scores are then used in an iterative k- means like procedure to refine the assignment un- der capacity constraints. In our experiments on arti- ficial data and two real world datasets our approach outperforms several state-of-the-art mathematical and heuristic solvers from the literature. Moreover, we apply our method in the context of a cluster- first-route-second approach to the Capacitated Ve- hicle Routing Problem (CVRP) and show competi- tive results on the well-known Uchoa benchmark.

List of keywords

Machine Learning -> ML: Clustering
Constraint Satisfaction and Optimization -> CSO: Applications
Machine Learning -> ML: Geometric learning

3704

Flaws of Termination and Optimality in ADOPT-based Algorithms

Koji Noshiro, Koji Hasebe

[+] More

[-] Less

A distributed constraint optimization problem (DCOP) is a framework to model multi-agent coordination problems. Asynchronous distributed optimization (ADOPT) is a well-known complete DCOP algorithm, and owing to its superior characteristics, many variants have been proposed over the last decade. It is considered proven that ADOPT-based algorithms have the key properties of termination and optimality, which guarantee that the algorithms terminate in a finite time and obtain an optimal solution, respectively. In this paper, we present counterexamples to the termination and optimality of ADOPT-based algorithms. The flaws are classified into three types, at least one of which exists in each of ADOPT and seven of its variants that we analyzed. In other words, the algorithms can never terminate or can terminate with a suboptimal solution. We also propose an amended version of ADOPT that avoids the flaws in existing algorithms and prove that it has the properties of termination and optimality.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Distributed constraints
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Constraint Satisfaction and Optimization -> CSO: Constraint optimization

3705

On the Role of Memory in Robust Opinion Dynamics

Luca Becchetti, Andrea Clementi, Amos Korman, Francesco Pasquale, Luca Trevisan, Robin Vacus

[+] More

[-] Less

We investigate opinion dynamics in a fully-connected system, consisting of n agents, where one of the opinions, called correct, represents a piece of information to disseminate. One source agent initially holds the correct opinion and remains with this opinion throughout the execution. The goal of the remaining agents is to quickly agree on this correct opinion. At each round, one agent chosen uniformly at random is activated: unless it is the source, the agent pulls the opinions of l random agents and then updates its opinion according to some rule. We consider a restricted setting, in which agents have no memory and they only revise their opinions on the basis of those of the agents they currently sample. This setting encompasses very popular opinion dynamics, such as the voter model and best-of-k majority rules. Qualitatively speaking, we show that lack of memory prevents efficient convergence. Specifically, we prove that any dynamics requires Omega(n^2) expected time, even under a strong version of the model in which activated agents have complete access to the current configuration of the entire system, i.e., the case l=n. Conversely, we prove that the simple voter model (in which l=1) correctly solves the problem, while almost matching the aforementioned lower bound. These results suggest that, in contrast to symmetric consensus problems (that do not involve a notion of correct opinion), fast convergence on the correct opinion using stochastic opinion dynamics may require the use of memory.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Agent-based and Multi-agent Systems -> MAS: Agent communication

3709

Sub-Band Based Attention for Robust Polyp Segmentation

Xianyong Fang, Yuqing Shi, Qingqing Guo, Linbo Wang, Zhengyi Liu

[+] More

[-] Less

This article proposes a novel spectral domain based solution to the challenging polyp segmentation. The main contribution is based on an interesting finding of the significant existence of the middle frequency sub-band during the CNN process. Consequently, a Sub-Band based Attention (SBA) module is proposed, which uniformly adopts either the high or middle sub-bands of the encoder features to boost the decoder features and thus concretely improve the feature discrimination. A strong encoder supplying informative sub-bands is also very important, while we highly value the local-and-global information enriched CNN features. Therefore, a Transformer Attended Convolution (TAC) module as the main encoder block is introduced. It takes the Transformer features to boost the CNN features with stronger long-range object contexts. The combination of SBA and TAC leads to a novel polyp segmentation framework, SBA-Net. It adopts TAC to effectively obtain encoded features which also input to SBA, so that efficient sub-bands based attention maps can be generated for progressively decoding the bottleneck features. Consequently, SBA-Net can achieve the robust polyp segmentation, as the experimental results demonstrate.

List of keywords

Computer Vision -> CV: Biomedical image analysis
Data Mining -> DM: Frequent pattern mining
Multidisciplinary Topics and Applications -> MDA: Health and medicine
Machine Learning -> ML: Applications

3724

Learning in Multi-Memory Games Triggers Complex Dynamics Diverging from Nash Equilibrium

Yuma Fujimoto, Kaito Ariu, Kenshi Abe

[+] More

[-] Less

Repeated games consider a situation where multiple agents are motivated by their independent rewards throughout learning. In general, the dynamics of their learning become complex. Especially when their rewards compete with each other like zero-sum games, the dynamics often do not converge to their optimum, i.e., Nash equilibrium. To tackle such complexity, many studies have understood various learning algorithms as dynamical systems and discovered qualitative insights among the algorithms. However, such studies have yet to handle multi-memory games (where agents can memorize actions they played in the past and choose their actions based on their memories), even though memorization plays a pivotal role in artificial intelligence and interpersonal relationship. This study extends two major learning algorithms in games, i.e., replicator dynamics and gradient ascent, into multi-memory games. Then, we prove their dynamics are identical. Furthermore, theoretically and experimentally, we clarify that the learning dynamics diverge from the Nash equilibrium in multi-memory zero-sum games and reach heteroclinic cycles (sojourn longer around the boundary of the strategy space), providing a fundamental advance in learning in games.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Agent-based and Multi-agent Systems -> MAS: Agent theories and models

3761

Tractable Diversity: Scalable Multiperspective Ontology Management via Standpoint EL

Lucia Gomez, Sebastian Rudolph, Hannes Strass

[+] More

[-] Less

The tractability of the lightweight description logic EL has allowed for the construction of large and widely used ontologies that support semantic interoperability. However, comprehensive domains with a broad user base are often at odds with strong axiomatisations otherwise useful for inferencing, since these are usually context-dependent and subject to diverging perspectives. In this paper we introduce Standpoint EL, a multi-modal extension of EL that allows for the integrated representation of domain knowledge relative to diverse, possibly conflicting standpoints (or contexts), which can be hierarchically organised and put in relation to each other. We show that Standpoint EL still exhibits EL’s favourable polytime standard reasoning, whereas introducing additional features such as empty standpoints, rigid roles, and nominals makes standard reasoning tasks intractable.

List of keywords

Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Knowledge Representation and Reasoning -> KRR: Reasoning about knowledge and belief

3814

Multi-Scale Subgraph Contrastive Learning

Yanbei Liu, Yu Zhao, Xiao Wang, Lei Geng, Zhitao Xiao

[+] More

[-] Less

Graph-level contrastive learning, aiming to learn the representations for each graph by contrasting two augmented graphs, has attracted considerable attention. Previous studies usually simply assume that a graph and its augmented graph as a positive pair, otherwise as a negative pair. However, it is well known that graph structure is always complex and multi-scale, which gives rise to a fundamental question: after graph augmentation, will the previous assumption still hold in reality? By an experimental analysis, we discover the semantic information of an augmented graph structure may be not consistent as original graph structure, and whether two augmented graphs are positive or negative pairs is highly related with the multi-scale structures. Based on this finding, we propose a multi-scale subgraph contrastive learning architecture which is able to characterize the fine-grained semantic information. Specifically, we generate global and local views at different scales based on subgraph sampling, and construct multiple contrastive relationships according to their semantic associations to provide richer self-supervised signals. Extensive experiments and parametric analyzes on eight graph classification real-world datasets well demonstrate the effectiveness of the proposed method.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Self-supervised Learning

3822

SQuAD-SRC: A Dataset for Multi-Accent Spoken Reading Comprehension

Yixuan Tang, Anthony Tung

[+] More

[-] Less

Spoken Reading Comprehension (SRC) is a challenging problem in spoken natural language retrieval, which automatically extracts the answer from the text-form contents according to the audio-form question. However, the existing spoken question answering approaches are mainly based on synthetically generated audio-form data, which may be ineffectively applied for multi-accent spoken question answering directly in many real-world applications. In this paper, we construct a large-scale multi-accent human spoken dataset SQuAD-SRC, in order to study the problem of multi-accent spoken reading comprehension. We choose 24 native English speakers from six different countries with various English accents and construct audio-form questions to the correspondent text-form contents by the chosen speakers. The dataset consists of 98,169 spoken question answering pairs and 20,963 passages from the popular machine reading comprehension dataset SQuAD. We present a statistical analysis of our SQuAD-SRC dataset and conduct extensive experiments on it by comparing cascaded SRC approaches and the enhanced end-to-end ones. Moreover, we explore various adaption strategies to improve the SRC performance, especially for multi-accent spoken questions.

List of keywords

Natural Language Processing -> NLP: Question answering
Natural Language Processing -> NLP: Speech

3832

Learning When to Use Automatic Tabulation in Constraint Model Reformulation

Carlo Cena, Zeynep Kiziltan, Peter Nightingale, Ian Miguel, Felix Ulrich-Oltean, Özgür Akgün

[+] More

[-] Less

Combinatorial optimisation has numerous practical applications, such as planning, logistics, or circuit design. Problems such as these can be solved by approaches such as Boolean Satisfiability (SAT) or Constraint Programming (CP). Solver performance is affected significantly by the model chosen to represent a given problem, which has led to the study of model reformulation. One such method is tabulation: rewriting the expression of some of the model constraints in terms of a single “table” constraint. Successfully applying this process means identifying expressions amenable to trans- formation, which has typically been done manually. Recently, Akgun et al. introduced an automatic tabulation using a set of hand-designed heuristics to identify constraints to tabulate. However, the performance of these heuristics varies across problem classes and solvers. Recent work has shown learning techniques to be increasingly useful in the context of automatic model reformulation. The goal of this study is to understand whether it is possible to improve the performance of such heuristics, by learning a model to predict whether or not to activate them for a given instance. Experimental results suggest that a random forest classifier is the most robust choice, improving the performance of four different SAT and CP solvers.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Modeling
Constraint Satisfaction and Optimization -> CSO: Solvers and tools
Machine Learning -> ML: Classification

3834

Sorting and Hypergraph Orientation under Uncertainty with Predictions

Thomas Erlebach, Murilo de Lima, Nicole Megow, Jens Schlöter

[+] More

[-] Less

Learning-augmented algorithms have been attracting increasing interest, but have only recently been considered in the setting of explorable uncertainty where precise values of uncertain input elements can be obtained by a query and the goal is to minimize the number of queries needed to solve a problem. We study learning-augmented algorithms for sorting and hypergraph orientation under uncertainty, assuming access to untrusted predictions for the uncertain values. Our algorithms provide improved performance guarantees for accurate predictions while maintaining worst-case guarantees that are best possible without predictions. For hypergraph orientation, for any $\gamma \geq 2$, we give an algorithm that achieves a competitive ratio of $1+1/\gamma$ for correct predictions and $\gamma$ for arbitrarily wrong predictions. For sorting, we achieve an optimal solution for accurate predictions while still being $2$-competitive for arbitrarily wrong predictions. These tradeoffs are the best possible. We also consider different error metrics and show that the performance of our algorithms degrades smoothly with the prediction error in all the cases where this is possible.

List of keywords

Search -> S: Combinatorial search and optimisation
Planning and Scheduling -> PS: Planning under uncertainty

3835

Regularisation for Efficient Softmax Parameter Generation in Low-Resource Text Classifiers

Daniel Grießhaber, Johannes Maucher, Ngoc Thang Vu

[+] More

[-] Less

Meta-learning has made tremendous progress in recent years and was demonstrated to be particularly suitable in low-resource settings where training data is very limited. However, meta-learning models still require large amounts of training tasks to achieve good generalisation. Since labelled training data may be sparse, self-supervision-based approaches are able to further improve performance on downstream tasks. Although no labelled data is necessary for this training, a large corpus of unlabelled text needs to be available. In this paper, we improve on recent advances in meta-learning for natural language models that allow training on a diverse set of training tasks for few-shot, low-resource target tasks. We introduce a way to generate new training data with the need for neither more supervised nor unsupervised datasets. We evaluate the method on a diverse set of NLP tasks and show that the model decreases in performance when trained on this data without further adjustments. Therefore, we introduce and evaluate two methods for regularising the training process and show that they not only improve performance when used in conjunction with the new training data but also improve average performance when training only on the original data, compared to the baseline.

List of keywords

Natural Language Processing -> NLP: Text classification
Machine Learning -> ML: Few-shot learning
Machine Learning -> ML: Meta-learning

3848

Distributional Multi-Objective Decision Making

Willem Röpke, Conor Hayes, Patrick Mannion, Enda Howley, Ann Nowé, Diederik Roijers

[+] More

[-] Less

For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly. Based on this criterion, we present the distributional undominated set and show that it contains optimal policies otherwise ignored by the Pareto front. In addition, we propose the convex distributional undominated set and prove that it comprises all policies that maximise expected utility for multivariate risk-averse decision makers. We propose a novel algorithm to learn the distributional undominated set and further contribute pruning operators to reduce the set to the convex distributional undominated set. Through experiments, we demonstrate the feasibility and effectiveness of these methods, making this a valuable new approach for decision support in real-world problems.

List of keywords

Uncertainty in AI -> UAI: Sequential decision making
Machine Learning -> ML: Reinforcement learning
Uncertainty in AI -> UAI: Other

3863

Less Learn Shortcut: Analyzing and Mitigating Learning of Spurious Feature-Label Correlation

Yanrui Du, Jing Yan, Yan Chen, Jing Liu, Sendong Zhao, Qiaoqiao She, Hua Wu, Haifeng Wang, Bing Qin

[+] More

[-] Less

Recent research has revealed that deep neural networks often take dataset biases as a shortcut to make decisions rather than understand tasks, leading to failures in real-world applications. In this study, we focus on the spurious correlation between word features and labels that models learn from the biased data distribution of training data. In particular, we define the word highly co-occurring with a specific label as biased word, and the example containing biased word as biased example. Our analysis shows that biased examples are easier for models to learn, while at the time of prediction, biased words make a significantly higher contribution to the models’ predictions, and models tend to assign predicted labels over-relying on the spurious correlation between words and labels. To mitigate models’ over-reliance on the shortcut (i.e. spurious correlation), we propose a training strategy Less-Learn-Shortcut (LLS): our strategy quantifies the biased degree of the biased examples and down-weights them accordingly. Experimental results on Question Matching, Natural Language Inference and Sentiment Analysis tasks show that LLS is a task-agnostic strategy and can improve the model performance on adversarial data while maintaining good performance on in-domain data.

List of keywords

Natural Language Processing -> NLP: Question answering

3873

Principal-Agent Boolean Games

David Hyland, Julian Gutierrez, Michael Wooldridge

[+] More

[-] Less

We introduce and study a computational version of the principal-agent problem — a classic problem in economics that arises when a principal desires to contract an agent to carry out some task, but has incomplete information about the agent or their subsequent actions. The key challenge in this setting is for the principal to design a contract for the agent such that the agent’s preferences are then aligned with those of the principal. We study this problem using a variation of Boolean games, where multiple players each choose valuations for Boolean variables under their control, seeking the satisfaction of a personal goal formula. In our setting, the principal can only observe some subset of these variables, and the principal chooses a contract which rewards players on the basis of the assignments they make for the variables that are observable to the principal. The principal’s challenge is to design a contract so that, firstly, the principal’s goal is achieved in some or all Nash equilibrium choices, and secondly, that the principal is able to verify that their goal is satisfied. In this paper, we formally define this problem and completely characterise the computational complexity of the most relevant decision problems associated with it.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis

3895

Towards Robust GAN-generated Image Detection: a Multi-view Completion Representation

Chi Liu, Tianqing Zhu, Sheng Shen, Wanlei Zhou

[+] More

[-] Less

GAN-generated image detection now becomes the first line of defense against the malicious uses of machine-synthesized image manipulations such as deepfakes. Although some existing detectors work well in detecting clean, known GAN samples, their success is largely attributable to overfitting unstable features such as frequency artifacts, which will cause failures when facing unknown GANs or perturbation attacks. To overcome the issue, we propose a robust detection framework based on a novel multi-view image completion representation. The framework first learns various view-to-image tasks to model the diverse distributions of genuine images. Frequency-irrelevant features can be represented from the distributional discrepancies characterized by the completion models, which are stable, generalized, and robust for detecting unknown fake patterns. Then, a multi-view classification is devised with elaborated intra- and inter-view learning strategies to enhance view-specific feature representation and cross-view feature aggregation, respectively. We evaluated the generalization ability of our framework across six popular GANs at different resolutions and its robustness against a broad range of perturbation attacks. The results confirm our method’s improved effectiveness, generalization, and robustness over various baselines.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Computer Vision -> CV: Neural generative models, auto encoders, GANs
Multidisciplinary Topics and Applications -> MDA: Security and privacy

3930

Matchings under One-Sided Preferences with Soft Quotas

Santhini K A, Raghu Raman Ravi, Meghana Nasre

[+] More

[-] Less

Assigning applicants to posts in the presence of the preferences of applicants and quotas associated with posts is extensively investigated. For a post, lower quota guarantees, and upper quota limits the number of applicants assigned to it. Typically, quotas are assumed to be fixed, which need not be the case in practice. We address this by introducing a soft quota setting, in which every post is associated with two values – lower target and upper target which together denote a range for the intended number of applicants in any assignment. Unlike the fixed quota setting, we allow the number of applicants assigned to a post to fall outside the range. This leads to assignments with deviation. Here, we study the problem of computing an assignment that has two orthogonal optimization objectives – minimizing the deviation (maximum or total) w.r.t. soft quotas and ensuring optimality w.r.t. preferences of applicants (rank-maximality or fairness). The order in which these objectives are considered, the different possibilities to optimize deviation combined with the well-studied notions of optimality w.r.t. preferences open up a range of optimization problems of practical importance. We present efficient algorithms based on flow-networks to solve these optimization problems.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

3934

Safe Reinforcement Learning via Probabilistic Logic Shields

Wen-Chi Yang, Giuseppe Marra, Gavin Rens, Luc De Raedt

[+] More

[-] Less

Safe Reinforcement learning (Safe RL) aims at learning optimal policies while staying safe. A popular solution to Safe RL is shielding, which uses a logical safety specification to prevent an RL agent from taking unsafe actions. However, traditional shielding techniques are difficult to integrate with continuous, end-to-end deep RL methods. To this end, we introduce Probabilistic Logic Policy Gradient (PLPG). PLPG is a model-based Safe RL technique that uses probabilistic logic programming to model logical safety constraints as differentiable functions. Therefore, PLPG can be seamlessly applied to any policy gradient algorithm while still providing the same convergence guarantees. In our experiments, we show that PLPG learns safer and more rewarding policies compared to other state-of-the-art shielding techniques.

List of keywords

Uncertainty in AI -> UAI: Statistical relational AI
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Reinforcement learning

3935

Ranking-based Argumentation Semantics applied to Logical Argumentation

Jesse Heyninck, Badran Raddaoui, Christian Straßer

[+] More

[-] Less

In formal argumentation, a distinction can be made between extension-based semantics, where sets of arguments are either (jointly) accepted or not, and ranking-based semantics, where grades of acceptability are assigned to arguments. Another important distinction is that between abstract approaches, that abstract away from the content of arguments, and structured approaches, that specify a method of constructing argument graphs on the basis of a knowledge base. While ranking-based semantics have been extensively applied to abstract argumentation, virtually no work has been done on ranking-based semantics for structured argumentation. In this paper, we make a systematic investigation into the behaviour of ranking-based semantics applied to existing formalisms for structured argumentation. We show that a wide class of ranking-based semantics give rise to so-called culpability measures, and are relatively robust to specific choices in argument construction methods.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation

3943

Neuro-Symbolic Learning of Answer Set Programs from Raw Data

Daniel Cunnington, Mark Law, Jorge Lobo, Alessandra Russo

[+] More

[-] Less

One of the ultimate goals of Artificial Intelligence is to assist humans in complex decision making. A promising direction for achieving this goal is Neuro-Symbolic AI, which aims to combine the interpretability of symbolic techniques with the ability of deep learning to learn from raw data. However, most current approaches require manually engineered symbolic knowledge, and where end-to-end training is considered, such approaches are either restricted to learning solutions with only one answer set, or are restricted to training binary neural networks. In this paper, we introduce Neuro-Symbolic Inductive Learner (NSIL), an approach that trains a general neural network to extract latent concepts from raw data, whilst learning symbolic knowledge that maps latent concepts to target labels. The novelty of our approach is a method for biasing the learning of symbolic knowledge, based on the in-training performance of both neural and symbolic components. We evaluate NSIL on three problem domains of different complexity, including an NP-complete problem. Our results demonstrate that NSIL learns expressive knowledge, solves computationally complex problems, and achieves state-of-the-art performance in terms of accuracy and data efficiency. Code and technical appendix: https://github.com/DanCunnington/NSIL

List of keywords

Machine Learning -> ML: Neuro-symbolic methods
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Knowledge Representation and Reasoning -> KRR: Logic programming

3948

Automatic Verification for Soundness of Bounded QNP Abstractions for Generalized Planning

Zhenhe Cui, Weidu Kuang, Yongmei Liu

[+] More

[-] Less

Generalized planning (GP) studies the computation of general solutions for a set of planning problems. Computing general solutions with correctness guarantee has long been a key issue in GP. Abstractions are widely used to solve GP problems. For example, a popular abstraction model for GP is qualitative numeric planning (QNP), which extends classical planning with non-negative real variables that can be increased or decreased by some arbitrary amount. The refinement of correct solutions of sound abstractions are solutions with correctness guarantees for GP problems. Recently, Cui et al. proposed a uniform abstraction framework for GP. They gave model-theoretic definitions of sound and complete abstractions for GP problems. In this paper, based on Cui et al.’s work, we explore automatic verification of sound abstractions for GP. Firstly, we present a proof-theoretic characterization for sound abstractions. Secondly, based on the characterization, we give a first-order verifiable sufficient condition for sound abstractions with deterministic actions. Then we study how to verify the sufficient condition when the abstraction models are bounded QNPs where integer variables can be incremented or decremented by one. To this end, we develop methods to handle counting and transitive closure, which are often used to define numerical variables. Finally, we implement a sound bounded QNP abstraction verification system and report experimental results on several domains.

List of keywords

Knowledge Representation and Reasoning -> KRR: Reasoning about actions

3955

Enhancing Network by Reinforcement Learning and Neural Confined Local Search

Qifu Hu, Ruyang Li, Qi Deng, Yaqian Zhao, Rengang Li

[+] More

[-] Less

It has been found that many real networks, such as power grids and the Internet, are non-robust, i.e., the failure of a small set of nodes would cause the paralysis of the entire network. Thus, the Network Enhancement Problem~(NEP), i.e., improving the robustness of a given network by modifying its structure, has attracted increasing attention. Heuristics have been proposed to address NEP. However, developing a heuristic demands extensive domain knowledge, and the hand-engineered heuristic often has significant performance limitations. Recently, a reinforcement learning algorithm has been proposed to discover heuristics automatically. However, the algorithm shows considerably inferior performance in out-of-distribution generalization. This paper aims to learn more effective heuristics. We propose a decision model by constructing features from domain knowledge to enhance the state representation. Then we improve the proposed model by designing a hierarchical attention model, where the query range changes from local to global, to utilize the network structure directly. Finally, we propose the neural confined local search~(NCLS) to realize the effective search of a large neighborhood, which exploits the learned decision policy to confine the neighborhood to avoid exhaustive enumeration. We conduct extensive experiments on synthetic and real networks to verify the effectiveness of our methods.

List of keywords

Data Mining -> DM: Mining graphs
Machine Learning -> ML: Deep reinforcement learning
Search -> S: Search and machine learning

3959

Orion: Online Backdoor Sample Detection via Evolution Deviance

Huayang Huang, Qian Wang, Xueluan Gong, Tao Wang

[+] More

[-] Less

Widely-used DNN models are vulnerable to backdoor attacks, where the backdoored model is only triggered by specific inputs but can maintain a high prediction accuracy on benign samples. Existing backdoor input detection strategies rely on the assumption that benign and poisoned samples are separable in the feature representation of the model. However, such an assumption can be broken by advanced feature-hidden backdoor attacks. In this paper, we propose a novel detection framework, dubbed Orion (online backdoor sample detection via evolution deviance). Specifically, we analyze how predictions evolve during a forward pass and find deviations between the shallow and deep outputs of the backdoor inputs. By introducing side nets to track such evolution divergence, Orion eliminates the need for the assumption of latent separability. Additionally, we put forward a scheme to restore the original label of backdoor samples, enabling more robust predictions. Extensive experiments on six attacks, three datasets, and two architectures verify the effectiveness of Orion. It is shown that Orion outperforms state-of-the-art defenses and can identify feature-hidden attacks with an F1-score of 90%, compared to 40% for other detection schemes. Orion can also achieve 80% label recovery accuracy on basic backdoor attacks.

List of keywords

Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Machine Learning -> ML: Adversarial machine learning
AI Ethics, Trust, Fairness -> ETF: Safety and robustness

3962

Unpaired Video Dehazing via Cross-frame Supervision

Yang Yang, Chun-Le Guo, Xiaojie Guo

[+] More

[-] Less

This paper investigates a novel unpaired video dehazing framework, which can be a good candidate in practical scenarios by relieving pressure from collecting paired data. In such a paradigm, two key issues including 1) loose supervision during training and 2) temporal consistency uninvolved in single image dehazing, need to be considered for satisfied performance. To handle the mentioned problems, we alternatively resort to constructing possible supervision across video frames. Specifically, we attempt to synthesize realistic motions with depth information to make unsupervised recycle and spatial consideration applicable, and thus effectively regularizing the spatiotemporal consistency. Moreover, based on the observation that the visibility of the same object in hazy scene changes with the camera motion, we devise an algorithm to search reference frames with lighter or denser hazes for each frame in training videos. A cross-frame contrastive loss term between the reference frames and current frames is designed to provide extra guidance for further boosting the performance. Extensive experiments are conducted to validate our superiority over other competitors. Code will be made publicly available.

List of keywords

Computer Vision -> CV: Computational photography
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

3964

Efficient Object Search in Game Maps

Jinchun Du, Bojie Shen, Shizhe Zhao, Muhammad Aamir Cheema, Adel Nadjaran Toosi

[+] More

[-] Less

Video games feature a dynamic environment where locations of objects (e.g., characters, equipment, weapons, vehicles etc.) frequently change within the game world. Although searching for relevant nearby objects in such a dynamic setting is a fundamental operation, this problem has received little research attention. In this paper, we propose a simple lightweight index, called Grid Tree, to store objects and their associated textual data. Our index can be efficiently updated with the underlying updates such as object movements, and supports a variety of object search queries, including k nearest neighbors (returning the k closest objects), keyword k nearest neighbors (returning the k closest objects that satisfy query keywords), and several other variants. Our extensive experimental study, conducted on standard game maps benchmarks and real-world keywords, demonstrates that our approach has up to 2 orders of magnitude faster update times for moving objects compared to state-of-the-art approaches such as navigation mesh and IR-tree. At the same time, query performance of our approach is similar to or better than that of IR-tree and up to two orders of magnitude faster than the other competitor.

List of keywords

Search -> S: Heuristic search
Planning and Scheduling -> PS: Scheduling
Robotics -> ROB: Motion and path planning

3978

Participatory Budgeting: Data, Tools and Analysis

Piotr Faliszewski, Jarosław Flis, Dominik Peters, Grzegorz Pierczyński, Piotr Skowron, Dariusz Stolicki, Stanisław Szufa, Nimrod Talmon

[+] More

[-] Less

We provide a library of participatory budgeting data (Pabulib) and open source tools (Pabutools and Pabustats) for analysing this data. We analyse how the results of participatory budgeting elections would change if a different selection rule was applied. We provide evidence that the outcomes of the Method of Equal Shares would be considerably fairer than those of the Utilitarian Greedy rule that is currently in use. We also show that the division of the projects into districts and/or categories can in many cases be avoided when using proportional rules. We find that this would increase the overall utility of the voters.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

4004

Maximin-aware allocations of indivisible chores with symmetric and asymmetric agents

Tianze Wei, Bo Li, Minming Li

[+] More

[-] Less

The real-world deployment of fair allocation algorithms usually involves a heterogeneous population of users, which makes it challenging for the users to get complete knowledge of the allocation except for their own bundles. Chan et al. [IJCAI 2019] proposed a new fairness notion, maximin-awareness (MMA), which guarantees that every agent is not the worst-off one, no matter how the items that are not allocated to this agent are distributed. We adapt and generalize this notion to the case of indivisible chores and when the agents may have arbitrary weights. Due to the inherent difficulty of MMA, we also consider its up to one and up to any relaxations. A string of results on the existence and computation of MMA related fair allocations, and their connections to existing fairness concepts is given. Our results show some contrasts with the case of goods and the case with symmetric agents, and also improve some results for goods proved in [Chan et al., IJCAI 2019].

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
Game Theory and Economic Paradigms -> GTEP: Fair division

4015

Optimal Anytime Coalition Structure Generation Utilizing Compact Solution Space Representation

Redha TAGUELMIMT, Samir AKNINE, Djamila Boukredera, Narayan Changder, Tuomas Sandholm

[+] More

[-] Less

Coalition formation is a central approach for multiagent coordination. A crucial part of coalition formation that is extensively studied in AI is coalition structure generation: partitioning agents into coalitions to maximize overall value. In this paper, we propose a novel method for coalition structure generation by introducing a compact and efficient representation of coalition structures. Our representation partitions the solution space into smaller, more manageable subspaces that gather structures containing coalitions of specific sizes. Our proposed method combines two new algorithms, one which leverages our compact representation and a branch-and-bound technique to generate optimal coalition structures, and another that utilizes a preprocessing phase to identify the most promising sets of coalitions to evaluate. Additionally, we show how parts of the solution space can be gathered into groups to avoid their redundant evaluation and we investigate the computational gain that is achieved by avoiding that redundant processing. Through this approach, our algorithm is able to prune the solution space more efficiently. Our results show that the proposed algorithm is superior to prior state-of-the-art methods in generating optimal coalition structures under several value distributions.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation

4025

SSML-QNet: Scale-Separative Metric Learning Quadruplet Network for Multi-modal Image Patch Matching

Xiuwei Zhang, Yi Sun, Yamin Han, Yanping Li, Hanlin Yin, Yinghui Xing, Yanning Zhang

[+] More

[-] Less

Multi-modal image matching is very challenging due to the significant diversities in visual appearance of different modal images. Typically, the existing well-performed methods mainly focus on learning invariant and discriminative features for measuring the relation between multi-modal image pairs. However, these methods often take the features as a whole and largely overlook the fact that different scale features for a same image pair may have different similarity, which may lead to sub-optimal results only. In this work, we propose a Scale-Separative Metric Learning Quadruplet network (SSML-QNet) for multi-modal image patch matching. Specifically, SSML-QNet can extract both relevant and irrelevant features of imaging modality with the proposed quadruplet network architecture. Then, the proposed Scale-Separative Metric Learning module separately encodes the similarity of different scale features with the pyramid structure. And for each scale, cross-modal consistent features are extracted and measured by coordinate and channel-wise attention sequentially. This makes our network robust to appearance divergence caused by different imaging mechanism. Experiments on the benchmark dataset (VIS-NIR, VIS-LWIR, Optical-SAR, and Brown) have verified that the proposed SSML-QNet is able to outperform other state-of-the-art methods. Furthermore, the cross-dataset transferring experiments on these four datasets also have shown that the proposed method has powerful ability of cross-dataset transferring.

List of keywords

Machine Learning -> ML: Classification
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning

4026

Moral Planning Agents with LTL Values

Umberto Grandi, Emiliano Lorini, Timothy Parker

[+] More

[-] Less

A moral planning agent (MPA) seeks to compare two plans or compute an optimal plan in an interactive setting with other agents, where relative ideality and optimality of plans are defined with respect to a prioritized value base. We model MPAs whose values are expressed by formulas of linear temporal logic (LTL) and define comparison for both joint plans and individual plans. We introduce different evaluation criteria for individual plans including an optimistic (risk-seeking) criterion, a pessimistic (risk-averse) one, and criteria based on the awareness of responsibility (responsibility-conscious). We provide complexity results for a variety of MPA problems.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Moral decision making
Agent-based and Multi-agent Systems -> MAS: Normative systems
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages

4043

Label Specific Multi-Semantics Metric Learning for Multi-Label Classification: Global Consideration Helps

Jun-Xiang Mao, Wei Wang, Min-Ling Zhang

[+] More

[-] Less

In multi-label classification, it is critical to capitalize on complicated data structures and semantic relationships. Metric learning serves as an effective strategy to provide a better measurement of distances between examples. Existing works on metric learning for multi-label classification mainly learn one single global metric that characterizes latent semantic similarity between multi-label instances. However, such single-semantics metric exploitation approaches can not capture the intrinsic properties of multi-label data possessed of rich semantics. In this paper, the first attempt towards multi-semantics metric learning for multi-label classification is investigated. Specifically, the proposed LIMIC approach simultaneously learns one global and multiple label-specific local metrics by exploiting label-specific side information. The global metric is learned to capture the commonality across all the labels and label-specific local metrics characterize the individuality of each semantic space. The combination of global metric and label-specific local metrics is utilized to construct latent semantic space for each label, in which similar intra-class instances are pushed closer and inter-class instances are pulled apart. Furthermore, metric-based label correlation regularization is constructed to maintain similarity between correlated label spaces. Extensive experiments on benchmark multi-label data sets validate the superiority of our proposed approach in learning effective distance metrics for multi-label classification.

List of keywords

Machine Learning -> ML: Multi-label
Machine Learning -> ML: Classification

4051

Solving the Identifying Code Set Problem with Grouped Independent Support

Anna Latour, Arunabha Sen, Kuldeep S Meel

[+] More

[-] Less

An important problem in network science is finding an optimal placement of sensors in nodes in order to uniquely detect failures in the network. This problem can be modelled as an identifying code set (ICS) problem, introduced by Karpovsky et al. in 1998. The ICS problem aims to find a cover of a set S, such that the elements in the cover define a unique signature for each of the elements of S, and to minimise the cardinality of that cover. In this work, we study a generalised identifying code set (GICS) problem, where a unique signature must be found for each subset of S that has a cardinality of at most k (instead of just each element of S). The concept of an independent support of a Boolean formula was introduced by Chakraborty et al. in 2014 to speed up propositional model counting, by identifying a subset of variables whose truth assignments uniquely define those of the other variables. In this work, we introduce an extended version of independent support, grouped independent support (GIS), and show how to reduce the GICS problem to the GIS problem. We then propose a new solving method for finding a GICS, based on finding a GIS. We show that the prior state-of-the-art approaches yield integer-linear programming (ILP) models whose sizes grow exponentially with the problem size and k, while our GIS encoding only grows polynomially with the problem size and k. While the ILP approach can solve the GICS problem on networks of at most 494 nodes, the GIS-based method can handle networks of up to 21363 nodes; a ~40x improvement. The GIS-based method shows up to a 520x improvement on the ILP-based method in terms of average solving time. For the majority of the instances that can be encoded by both methods, the cardinality of the solution returned by the GIS-based method is less than 10% larger than the cardinality of the solution found by the ILP method.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint programming
Constraint Satisfaction and Optimization -> CSO: Modeling
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty

4062

Strategic Resource Selection with Homophilic Agents

Jonathan Gadea Harder, Simon Krogmann, Pascal Lenzner, Alexander Skopalik

[+] More

[-] Less

The strategic selection of resources by selfish agents is a classical research direction, with Resource Selection Games and Congestion Games as prominent examples. In these games, agents select available resources and their utility then depends on the number of agents using the same resources. This implies that there is no distinction between the agents, i.e., they are anonymous. We depart from this very general setting by proposing Resource Selection Games with heterogeneous agents that strive for a joint resource usage with similar agents. So, instead of the number of other users of a given resource, our model considers agents with different types and the decisive feature is the fraction of same-type agents among the users. More precisely, similarly to Schelling Games, there is a tolerance threshold tau in [0,1] which specifies the agents’ desired minimum fraction of same-type agents on a resource. Agents strive to select resources where at least a tau-fraction of those resources’ users have the same type as themselves. For tau=1, our model generalizes hedonic diversity games with single-peaked utilities with a peak at 1. For our general model, we consider the existence and quality of equilibria and the complexity of maximizing the social welfare. Additionally, we consider a bounded rationality model, where agents can only estimate the utility of a resource, since they only know the fraction of same-type agents on a given resource, but not the exact numbers. Thus, they cannot know the impact a strategy change would have on a target resource. Interestingly, we show that this type of bounded rationality yields favorable game-theoretic properties and specific equilibria closely approximate equilibria of the full knowledge setting.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Game Theory and Economic Paradigms -> GTEP: Computational social choice

4068

Deliberation as Evidence Disclosure: A Tale of Two Protocol Types

Julian Chingoma, Adrian Haret

[+] More

[-] Less

We study a model inspired by deliberative practice, in which agents selectively disclose evidence about a set of alternatives prior to taking a final decision on them. We are interested in whether such a process, when iterated to termination, results in the objectively best alternatives being selected—thereby lending support to the idea that groups can be wise even when their members communicate with each other. We find that, under certain restrictions on the relative amounts of evidence, together with the actions available to the agents, there exist deliberation protocols in each of the two families we look at (i.e., simultaneous and sequential) that offer desirable guarantees. Simulation results further complement this picture, by showing how the distribution of evidence among the agents influences parameters of interest, such as the outcome of the protocols and the number of rounds until termination.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

4071

Generalization through Diversity: Improving Unsupervised Environment Design

Wenjun Li, Pradeep Varakantham, Dexun Li

[+] More

[-] Less

Agent decision making using Reinforcement Learning (RL) heavily relies on either a model or simulator of the environment (e.g., moving in an 8×8 maze with three rooms, playing Chess on an 8×8 board). Due to this dependence, small changes in the environment (e.g. positions of obstacles in the maze, size of the board) can severely affect the effectiveness of the policy learnt by the agent. To that end, existing work has proposed training RL agents on an adaptive curriculum of environments (generated automatically) to improve performance on out-of-distribution (OOD) test scenarios. Specifically, existing research has employed potential for the agent to learn in an environment (captured using Generalized Advantage Estimation, GAE) as the key factor to select the next environment(s) to train the agent. However, such a mechanism can select similar environments (with high potential to learn) thereby making agent training redundant on all but one of those environments. To that end, we provide a principled approach to adaptively identify diverse environments based on a novel distance measure relevant to environment design. We empirically demonstrate the versatility and effectiveness of our method in comparison to multiple leading approaches for unsupervised environment design on three distinct benchmark problems used in literature.

List of keywords

Planning and Scheduling -> PS: Search in planning and scheduling
Machine Learning -> ML: Deep reinforcement learning
Planning and Scheduling -> PS: POMDPs

4078

Co-Certificate Learning with SAT Modulo Symmetries

Markus Kirchweger, Tomas Peitl, Stefan Szeider

[+] More

[-] Less

We present a new SAT-based method for generating all graphs up to isomorphism that satisfy a given co-NP property. Our method extends the SAT Modulo Theory (SMS) framework [Kirchweger and Szeider, 2021] with a new technique that we call co-certificate learning. If SMS generates a candidate graph that violates the given co-NP property, we obtain a certificate for this violation, i.e., a "co-certificate" for the co-NP property. The co-certificate gives rise to a clause that the SAT solver, serving as SMS’s backend, learns as part of its CDCL procedure. We demonstrate that SMS plus co-certificate learning is a powerful method that allows us to improve the best-known lower bound on the size of Kochen-Specker vector systems, a problem that is central to the foundations of quantum mechanics and has been studied for over half a century. Our approach is orders of magnitude faster and scales significantly better than a recent SAT-based method [Li et al. 2022].

List of keywords

Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Solvers and tools

4087

Quantitative Reasoning and Structural Complexity for Claim-Centric Argumentation

Markus Hecher, Johannes Fichte, Yasir Mahmood, Arne Meier

[+] More

[-] Less

Argumentation is a well-established formalism for non-monotonic reasoning and a vibrant area of research in AI. Claim-augmented argumentation frameworks (CAFs) have been introduced to deploy a conclusion-oriented perspective. CAFs expand argumentation frameworks by an additional step which involves retaining claims for an accepted set of arguments. In this work, we explore the parameterized complexity of various reasoning problems for CAFs. We begin by presenting a suitable graph representation which includes arguments, as well as, their associated claims. Our analysis includes the parameter treewidth and we present decomposition-guided reductions between reasoning problems in CAF and the validity problem for QBF. Furthermore, we introduce a novel concept of a justification status for claims, which is a quantitative measure on extensions supporting a particular claim. The well-studied problems of credulous and skeptical reasoning can then be seen as simply the two endpoints of the spectrum when considered as a justification level of a claim.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

4090

Singularformer: Learning to Decompose Self-Attention to Linearize the Complexity of Transformer

Yifan Wu, Shichao Kan, Min Zeng, Min Li

[+] More

[-] Less

Transformers achieve excellent performance in a variety of domains since they can capture long-distance dependencies through the self-attention mechanism. However, self-attention is computationally costly due to its quadratic complexity and high memory consumption. In this paper, we propose a novel Transformer variant (Singularformer) that uses neural networks to learn the singular value decomposition process of the attention matrix to design a linear-complexity and memory-efficient global self-attention mechanism. Specifically, we decompose the attention matrix into the product of three matrix factors based on singular value decomposition and design neural networks to learn these matrix factors, then the associative law of matrix multiplication is used to linearize the calculation of self-attention. The above procedure allows us to compute self-attention as two-dimensional reduction processes in the first and second token dimensional spaces, followed by a multi-head self-attention computational process on the first dimensional reduced token features. Experimental results on 8 real-world datasets demonstrate that Singularformer performs favorably against the other Transformer variants with lower time and space complexity. Our source code is publicly available at https://github.com/CSUBioGroup/Singularformer.

List of keywords

Machine Learning -> ML: Attention models

4097

Transferable Curricula through Difficulty Conditioned Generators

Sidney Tio, Pradeep Varakantham

[+] More

[-] Less

Advancements in reinforcement learning (RL) have demonstrated superhuman performance in complex tasks such as Starcraft, Go, Chess etc. However, knowledge transfer from Artificial "Experts" to humans remain a significant challenge. A promising avenue for such transfer would be the use of curricula. Recent methods in curricula generation focuses on training RL Agents efficiently, yet such methods require millions of experiences in order to train a single agent, and are not suited for human training. In this paper, we introduce a method named \textit{Parameterized Environment Response Model} (PERM) that shows promising results in training RL Agents in parameterized environments. Inspired by Item Response Theory, PERM seeks to model difficulty of environments and ability of RL agents directly. Given that RL agents and humans are trained more efficiently under the "zone of proximal development", our method generates a curriculum by matching the difficulty of an environment to the current ability of the student. In addition, PERM does not require RL updates and can be trained offline, making it suitable for training humans. We demonstrate PERM’s ability to represent the environment parameter space, and training with RL agents with PERM produces a strong performance in deterministic environments. Lastly, we show that our method is transferable between students, without any sacrifice in training quality.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Social sciences
Humans and AI -> HAI: Computer-aided education
Multidisciplinary Topics and Applications -> MDA: Game playing

4104

Strip Attention for Image Restoration

Yuning Cui, Yi Tao, Luoxi Jing, Alois Knoll

[+] More

[-] Less

As a long-standing topic, image restoration aims to recover the latent sharp image from its degraded counterpart. In recent years, due to the strong ability of self-attention in capturing long-range dependencies, Transformer based methods have achieved promising performance on multifarious image restoration tasks. However, the canonical self-attention leads to quadratic complexity with respect to input size, hindering its further applications in image restoration. In this paper, we propose a Strip Attention Network (SANet) for image restoration to integrate contextual information in a more efficient and effective manner. Specifically, a strip attention unit is proposed to harvest the contextual information for each pixel from its adjacent pixels in the same row or column. By employing this operation in different directions, each location can perceive information from an expanded region. Furthermore, we apply various receptive fields in different feature groups to enhance representation learning. Incorporating these designs into a U-shaped backbone, our SANet performs favorably against state-of-the-art algorithms on several image restoration tasks. The code and pre-trained models will be publicly available upon acceptance.

List of keywords

Computer Vision -> CV: Other
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Representation learning

4118

Multi-Agent Intention Recognition and Progression

Michael Dann, Yuan Yao, Natasha Alechina, Brian Logan, Felipe Meneguzzi, John Thangarajah

[+] More

[-] Less

For an agent in a multi-agent environment, it is often beneficial to be able to predict what other agents will do next when deciding how to act. Previous work in multi-agent intention scheduling assumes a priori knowledge of the current goals of other agents. In this paper, we present a new approach to multi-agent intention scheduling in which an agent uses online goal recognition to identify the goals currently being pursued by other agents while acting in pursuit of its own goals. We show how online goal recognition can be incorporated into an MCTS-based intention scheduler, and evaluate our approach in a range of scenarios. The results demonstrate that our approach can rapidly recognise the goals of other agents even when they are pursuing multiple goals concurrently, and has similar performance to agents which know the goals of other agents a priori.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Planning and Scheduling -> PS: Activity and plan recognition

4122

Truthful Fair Mechanisms for Allocating Mixed Divisible and Indivisible Goods

Zihao Li, Shengxin Liu, Xinhang Lu, Biaoshuai Tao

[+] More

[-] Less

We study the problem of designing truthful and fair mechanisms when allocating a mixture of divisible and indivisible goods. We first show that there does not exist an EFM (envy-free for mixed goods) and truthful mechanism in general. This impossibility result holds even if there is only one indivisible good and one divisible good and there are only two agents. Thus, we focus on some more restricted settings. Under the setting where agents have binary valuations on indivisible goods and identical valuations on a single divisible good (e.g., money), we design an EFM and truthful mechanism. When agents have binary valuations over both divisible and indivisible goods, we first show there exist EFM and truthful mechanisms when there are only two agents or when there is a single divisible good. On the other hand, we show that the mechanism maximizing Nash welfare cannot ensure EFM and truthfulness simultaneously.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Game Theory and Economic Paradigms -> GTEP: Computational social choice
Game Theory and Economic Paradigms -> GTEP: Mechanism design

4124

Online Harmonizing Gradient Descent for Imbalanced Data Streams One-Pass Classification

Han Zhou, Hongpeng Yin, Xuanhong Deng, Yuyu Huang

[+] More

[-] Less

Many real-world streaming data are sequentially collected over time and with skew-distributed classes. In this situation, online learning models may tend to favor samples from majority classes, making the wrong decisions for those from minority classes. Previous methods try to balance the instance number of different classes or assign asymmetric cost values. They usually require data-buffers to store streaming data or pre-defined cost parameters. This study alternatively shows that the imbalance of instances can be implied by the imbalance of gradients. Then, we propose the Online Harmonizing Gradient Descent (OHGD) for one-pass online classification. By harmonizing the gradient magnitude occurred by different classes, the method avoids the bias of the proposed method in favor of the majority class. Specifically, OHGD requires no data-buffer, extra parameters, or prior knowledge. It also handles imbalanced data streams the same way that it would handle balanced data streams, which facilitates its easy implementation. On top of a few common and mild assumptions, the theoretical analysis proves that OHGD enjoys a satisfying sub-linear regret bound. Extensive experimental results demonstrate the high efficiency and effectiveness in handling imbalanced data streams.

List of keywords

Data Mining -> DM: Mining data streams
Data Mining -> DM: Class imbalance and unequal cost
Machine Learning -> ML: Classification

4132

Lifelong Multi-view Spectral Clustering

Hecheng Cai, Yuze Tan, Shudong Huang, Jiancheng Lv

[+] More

[-] Less

In recent years, spectral clustering has become a well-known and effective algorithm in machine learning. However, traditional spectral clustering algorithms are designed for single-view data and fixed task setting. This can become a limitation when dealing with new tasks in a sequence, as it requires accessing previously learned tasks. Hence it leads to high storage consumption, especially for multi-view datasets. In this paper, we address this limitation by introducing a lifelong multi-view clustering framework. Our approach uses view-specific knowledge libraries to capture intra-view knowledge across different tasks. Specifically, we propose two types of libraries: an orthogonal basis library that stores cluster centers in consecutive tasks, and a feature embedding library that embeds feature relations shared among correlated tasks. When a new clustering task is coming, the knowledge is iteratively transferred from libraries to encode the new task, and knowledge libraries are updated according to the online update formulation. Meanwhile, basis libraries of different views are further fused into a consensus library with adaptive weights. Experimental results show that our proposed method outperforms other competitive clustering methods on multi-view datasets by a large margin.

List of keywords

Machine Learning -> ML: Clustering
Machine Learning -> ML: Multi-view learning

4139

Recursive Small-Step Multi-Agent A* for Dec-POMDPs

Wietze Koops, Nils Jansen, Sebastian Junges, Thiago Simão

[+] More

[-] Less

We present recursive small-step multi-agent A* (RS-MAA*), an exact algorithm that optimizes the expected reward in decentralized partially observable Markov decision processes (Dec-POMDPs). RS-MAA* builds on multi-agent A* (MAA*), an algorithm that finds policies by exploring a search tree, but tackles two major scalability concerns. First, we employ a modified, small-step variant of the search tree that avoids the double exponential outdegree of the classical formulation. Second, we use a tight and recursive heuristic that we compute on-the-fly, thereby avoiding an expensive precomputation. The resulting algorithm is conceptually simple, yet it shows superior performance on a rich set of standard benchmarks.

List of keywords

Planning and Scheduling -> PS: Distributed and multi-agent planning
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: Planning under uncertainty

4148

ScriptWorld: Text Based Environment For Learning Procedural Knowledge

Abhinav Joshi, Areeb Ahmad, Umang Pandey, Ashutosh Modi

[+] More

[-] Less

Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning based agents. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: A text-based environment for teaching agents about real-world daily chores and hence imparting commonsense knowledge. To the best of our knowledge, it is the first interactive text-based gaming framework that consists of daily real-world human activities designed using scripts dataset. We provide gaming environments for 10 daily activities and perform a detailed analysis of the proposed environment. We develop reinforcement learning based baseline models/agents to play the games in ScriptWorld. To understand the role of language models in such environments, we leverage features obtained from pre-trained language models in the RL agents. Our experiments show that prior knowledge obtained from a pre-trained language model helps to solve real-world text-based gaming environments.

List of keywords

Natural Language Processing -> NLP: Applications
Agent-based and Multi-agent Systems -> MAS: Applications
Machine Learning -> ML: Deep reinforcement learning

4152

Model Predictive Control with Reach-avoid Analysis

Dejin Ren, Wanli Lu, Jidong Lv, Lijun Zhang, Bai Xue

[+] More

[-] Less

In this paper we investigate the optimal controller synthesis problem, so that the system under the controller can reach a specified target set while satisfying given constraints. Existing model predictive control (MPC) methods learn from a set of discrete states visited by previous (sub-)optimized trajectories and thus result in computationally expensive mixed-integer nonlinear optimization. In this paper a novel MPC method is proposed based on reach-avoid analysis to solve the controller synthesis problem iteratively. The reach-avoid analysis is concerned with computing a reach-avoid set which is a set of initial states such that the system can reach the target set successfully. It not only provides terminal constraints, which ensure feasibility of MPC, but also expands discrete states in existing methods into a continuous set (i.e., reach-avoid sets) and thus leads to nonlinear optimization which is more computationally tractable online due to the absence of integer variables. Finally, we evaluate the proposed method and make comparisons with state-of-the-art ones based on several examples.

List of keywords

Planning and Scheduling -> PS: Learning in planning and scheduling
Machine Learning -> ML: Optimization

4163

Treewidth-Aware Complexity for Evaluating Epistemic Logic Programs

Jorge Fandinno, Markus Hecher

[+] More

[-] Less

Logic programs are a popular formalism for encoding many problems relevant to knowledge representation and reasoning as well as artificial intelligence. However, for modeling rational behavior it is oftentimes required to represent the concepts of knowledge and possibility. Epistemic logic programs (ELPs) are such an extension of logic programs that enable both concepts, which respectively correspond to being true in all or some possible worlds or stable models. While the classical complexity of evaluating these epistemic logic programs is well-studied, parameterized studies aiming for a more fine-grained complexity analysis are limited. Especially the parameter treewidth has regained popularity in the context of epistemic logic programs. We present new results for the evaluation of key fragments of ELPs, which for treewidth are exponentially better than the known results for disjunctive ELPs. Unfortunately, we also prove that the runtimes we obtain can not be significantly improved, assuming the exponential time hypothesis. Our results work by defining treewidth-aware reductions between quantified Boolean formulas and ELPs. As a side result we thereby establish that the completion of a logic program, as used in modern solvers, can be turned treewidth-aware such that the treewidth is linearly preserved.

List of keywords

Knowledge Representation and Reasoning -> KRR: Logic programming
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

4168

ALL-E: Aesthetics-guided Low-light Image Enhancement

Ling Li, Dong Liang, Yuanhang Gao, Sheng-Jun Huang, Songcan Chen

[+] More

[-] Less

Evaluating the performance of low-light image enhancement (LLE) is highly subjective, thus making integrating human preferences into image enhancement a necessity. Existing methods fail to consider this and present a series of potentially valid heuristic criteria for training enhancement models. In this paper, we propose a new paradigm, i.e., aesthetics-guided low-light image enhancement (ALL-E), which introduces aesthetic preferences to LLE and motivates training in a reinforcement learning framework with an aesthetic reward. Each pixel, functioning as an agent, refines itself by recursive actions, i.e., its corresponding adjustment curve is estimated sequentially. Extensive experiments show that integrating aesthetic assessment improves both subjective experience and objective evaluation. Our results on various benchmarks demonstrate the superiority of ALL-E over state-of-the-art methods. Source code and models are in the supplementary material for evaluation.

List of keywords

Computer Vision -> CV: Computational photography
AI Ethics, Trust, Fairness -> ETF: Societal impact of AI
Humans and AI -> HAI: Personalization and user modeling

4170

Enhancing Efficient Continual Learning with Dynamic Structure Development of Spiking Neural Networks

Bing Han, Feifei Zhao, Yi Zeng, Wenxuan Pan, Guobin Shen

[+] More

[-] Less

Children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep neural networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms during child growth and development, we propose Dynamic Structure Development of Spiking Neural Networks (DSD-SNN) for efficient and adaptive continual learning. When learning a sequence of tasks, the DSD-SNN dynamically assigns and grows new neurons to new tasks and prunes redundant neurons, thereby increasing memory capacity and reducing computational overhead. In addition, the overlapping shared structure helps to quickly leverage all acquired knowledge to new tasks, empowering a single network capable of supporting multiple incremental tasks (without the separate sub-network mask for each task). We validate the effectiveness of the proposed model on multiple class incremental learning and task incremental learning benchmarks. Extensive experiments demonstrated that our model could significantly improve performance, learning speed and memory capacity, and reduce computational overhead. Besides, our DSD-SNN model achieves comparable performance with the DNNs-based methods, and significantly outperforms the state-of-the-art (SOTA) performance for existing SNNs-based continual learning methods.

List of keywords

Humans and AI -> HAI: Cognitive modeling

4172

An Experimental Comparison of Multiwinner Voting Rules on Approval Elections

Piotr Faliszewski, Martin Lackner, Krzysztof Sornat, Stanisław Szufa

[+] More

[-] Less

In this paper, we experimentally compare major approval-based multiwinner voting rules. To this end, we define a measure of similarity between two equal-sized committees subject to a given election. Using synthetic elections coming from several distributions, we analyze how similar are the committees provided by prominent voting rules. Our results can be visualized as “maps of voting rules”, which provide a counterpoint to a purely axiomatic classification of voting rules. We further investigate the relation of axiomatic analysis to our approach by evaluating how frequently committees computed by our rules satisfy proportionality properties.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

4184

Towards Sharp Analysis for Distributed Learning with Random Features

Jian Li, Yong Liu

[+] More

[-] Less

In recent studies, the generalization properties for distributed learning and random features assumed the existence of the target concept over the hypothesis space. However, this strict condition is not applicable to the more common non-attainable case. In this paper, using refined proof techniques, we first extend the optimal rates for distributed learning with random features to the non-attainable case. Then, we reduce the number of required random features via data-dependent generating strategy, and improve the allowed number of partitions with additional unlabeled data. Theoretical analysis shows these techniques remarkably reduce computational cost while preserving the optimal generalization accuracy under standard assumptions. Finally, we conduct several experiments on both simulated and real-world datasets, and the empirical results validate our theoretical findings.

List of keywords

Machine Learning -> ML: Learning theory
Machine Learning -> ML: Kernel methods

4194

Exploring Effective Inter-encoder Semantic Interaction for Document-Level Relation Extraction

liang zhang, Zijun Min, Jinsong Su, Pei YU, ante wang, Yidong Chen

[+] More

[-] Less

In document-level relation extraction (RE), the models are required to correctly predict implicit relations in documents via relational reasoning. To this end, many graph-based methods have been proposed for this task. Despite their success, these methods still suffer from several drawbacks: 1) their interaction between document encoder and graph encoder is usually unidirectional and insufficient; 2) their graph encoders often fail to capture the global context of nodes in document graph. In this paper, we propose a document-level RE model with a Graph-Transformer Network (GTN). The GTN includes two core sublayers: 1) the graph-attention sublayer that simultaneously models global and local contexts of nodes in the document graph; 2) the cross-attention sublayer, enabling GTN to capture the non-entity clue information from the document encoder. Furthermore, we introduce two auxiliary training tasks to enhance the bidirectional semantic interaction between the document encoder and GTN: 1) the graph node reconstruction that can effectively train our cross-attention sublayer to enhance the semantic transition from the document encoder to GTN; 2) the structure-aware adversarial knowledge distillation, by which we can effectively transfer the structural information of GTN to the document encoder. Experimental results on four benchmark datasets prove the effectiveness of our model.

List of keywords

Natural Language Processing -> NLP: Information extraction
Natural Language Processing -> NLP: Information retrieval and text mining

4195

Multi-view Contrastive Learning Hypergraph Neural Network for Drug-Microbe-Disease Association Prediction

Luotao Liu, Feng Huang, Xuan Liu, Zhankun Xiong, Menglu Li, Congzhi Song, Wen Zhang

[+] More

[-] Less

Identifying the potential associations among drugs, microbes and diseases is of great significance in exploring the pathogenesis and improving precision medicine. There are plenty of computational methods for pair-wise association prediction, such as drug-microbe and microbe-disease associations, but few methods focus on the higher-order triple-wise drug-microbe-disease (DMD) associations. Driven by the advancement of hypergraph neural networks (HGNNs), we expect them to fully capture high-order interaction patterns behind the hypergraph formulated by DMD associations and realize sound prediction performance. However, the confirmed DMD associations are insufficient due to the high cost of in vitro screening, which forms a sparse DMD hypergraph and thus brings in suboptimal generalization ability. To mitigate the limitation, we propose a Multi-view Contrastive Learning Hypergraph Neural Network, named MCHNN, for DMD association prediction. We design a novel multi-view contrastive learning on the DMD hypergraph as an auxiliary task, which guides the HGNN to learn more discriminative representations and enhances the generalization ability. Extensive computational experiments show that MCHNN achieves satisfactory performance in DMD association prediction and, more importantly, demonstrate the effectiveness of our devised multi-view contrastive learning on the sparse DMD hypergraph.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Bioinformatics
Multidisciplinary Topics and Applications -> MDA: Health and medicine

4206

In Which Graph Structures Can We Efficiently Find Temporally Disjoint Paths and Walks?

Pascal Kunz, Hendrik Molter, Meirav Zehavi

[+] More

[-] Less

The study of the computational complexity of finding temporally disjoint paths or walks in temporal graphs has recently been initiated by Klobas et al. [IJCAI~’21]. A temporal graph has an edge set that may change over discrete time steps and a temporal path (or walk) must traverse edges that appear at increasing time steps. Two temporal paths (or walks) are temporally disjoint if they do not visit any vertex at the same time. The problem of finding such paths or walks is motivated by applications in multi-agent path finding (MAPF) which include robotics, warehouse management, aircraft management, and traffic routing. We extend Klobas et al.’s research by providing parameterized hardness results for very restricted cases, with a focus on structural parameters of the so-called underlying graph. On the positive side, we identify sufficiently simple cases where we can solve the problem efficiently. Our results reveal some surprising differences between the “path version” and the “walk version” (where vertices may be visited multiple times) of the problem and answer several open questions posed by Klobas et al.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: Routing

4215

Semantic-Aware Generation of Multi-View Portrait Drawings

Biao Ma, Fei Gao, Chang Jiang, Nannan Wang, Gang Xu

[+] More

[-] Less

Neural radiance fields (NeRF) based methods have shown amazing performance in synthesizing 3D-consistent photographic images, but fail to generate multi-view portrait drawings. The key is that the basic assumption of these methods — a surface point is consistent when rendered from different views — doesn’t hold for drawings. In a portrait drawing, the appearance of a facial point may changes when viewed from different angles. Besides, portrait drawings usually present little 3D information and suffer from insufficient training data. To combat this challenge, in this paper, we propose a Semantic-Aware GEnerator (SAGE) for synthesizing multi-view portrait drawings. Our motivation is that facial semantic labels are view-consistent and correlate with drawing techniques. We therefore propose to collaboratively synthesize multi-view semantic maps and the corresponding portrait drawings. To facilitate training, we design a semantic-aware domain translator, which generates portrait drawings based on features of photographic faces. In addition, use data augmentation via synthesis to mitigate collapsed results. We apply SAGE to synthesize multi-view portrait drawings in diverse artistic styles. Experimental results show that SAGE achieves significantly superior or highly competitive performance, compared to existing 3D-aware image synthesis methods. The codes are available at https://github.com/AiArt-HDU/SAGE.

List of keywords

Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Neural generative models, auto encoders, GANs
Multidisciplinary Topics and Applications -> MDA: Arts and creativity

4226

Depth-Relative Self Attention for Monocular Depth Estimation

Kyuhong Shim, Jiyoung Kim, Gusang Lee, Byonghyo Shim

[+] More

[-] Less

Monocular depth estimation is very challenging because clues to the exact depth are incomplete in a single RGB image. To overcome the limitation, deep neural networks rely on various visual hints such as size, shade, and texture extracted from RGB information. However, we observe that if such hints are overly exploited, the network can be biased on RGB information without considering the comprehensive view. We propose a novel depth estimation model named RElative Depth Transformer (RED-T) that uses relative depth as guidance in self-attention. Specifically, the model assigns high attention weights to pixels of close depth and low attention weights to pixels of distant depth. As a result, the features of similar depth can become more likely to each other and thus less prone to misused visual hints. We show that the proposed model achieves competitive results in monocular depth estimation benchmarks and is less biased to RGB information. In addition, we propose a novel monocular depth estimation benchmark that limits the observable depth range during training in order to evaluate the robustness of the model for unseen depths.

List of keywords

Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Applications

4228

Auto-bidding with Budget and ROI Constrained Buyers

Xiaodong Liu, Weiran Shen

[+] More

[-] Less

In online advertising markets, an increasing number of advertisers are adopting auto-bidders to buy advertising slots. Such a tool can simplify the whole process of optimizing their bids according to different financial constraints. We study second-price auctions where bidders have both private budget and private ROI (return on investment) constraints. We formulate the auto-bidding system design problem as a mathematical program and analyze the auto-bidders’ bidding strategy under such constraints. We show that our design ensures truthfulness, i.e., among all pure and mixed strategies, always reporting the truthful budget and ROI is an optimal strategy for the bidders. Although the program is non-convex, we provide a fast algorithm to compute the optimal bidding strategy for the bidders based on our analysis. We also study the welfare and provide a lower bound for the PoA (price of anarchy). Furthermore, we show that if all bidders use our auto-bidding system, there exists a Bayesian Nash equilibrium and we give a sufficient condition under which the iterated best response process converges to such an equilibrium. Finally, we also conduct extensive experiments to empirically evaluate our design.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Mechanism design

4229

Targeting Minimal Rare Itemsets from Transaction Databases

Amel Hidouri, Badran Raddaoui, Said Jabbour

[+] More

[-] Less

The computation of minimal rare itemset is a well-known task in data mining, with numerous applications, e.g., drugs effects analysis and network security, among others. This paper introduces a SAT-based formulation to efficiently discover minimal rare itemsets from transaction databases. Then, we extend our proposed approach for mining a new kind of pattern which we call k-minimal rare itemset. A k-minimal rare itemset X is a minimal rare itemset where X \ Y s.t. |Y | ≤ k − 1 is frequent. To our knowledge, this makes our work the first approach to generalise the traditional model of rare itemset. Afterwards, to scale up, we decompose the initial problem into smaller, more manageable and easier to solve sub-problems to find the entire set of itemsets. Finally, we demonstrate the effectiveness and efficiency of our approach through extensive experimental analysis on a variety of popular datasets in comparison to existing specialized and CP-based algorithms.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Data Mining -> DM: Frequent pattern mining

4236

Denial-of-Service or Fine-Grained Control: Towards Flexible Model Poisoning Attacks on Federated Learning

Hangtao Zhang, Zeming Yao, LEO YU ZHANG, Shengshan Hu, Chao Chen, Alan Liew, Zhetao Li

[+] More

[-] Less

Federated learning (FL) is vulnerable to poisoning attacks, where adversaries corrupt the global aggregation results and cause denial-of-service (DoS). Unlike recent model poisoning attacks that optimize the amplitude of malicious perturbations along certain prescribed directions to cause DoS, we propose a flexible model poisoning attack (FMPA) that can achieve versatile attack goals. We consider a practical threat scenario where no extra knowledge about the FL system (e.g., aggregation rules or updates on benign devices) is available to adversaries. FMPA exploits the global historical information to construct an estimator that predicts the next round of the global model as a benign reference. It then fine-tunes the reference model to obtain the desired poisoned model with low accuracy and small perturbations. Besides the goal of causing DoS, FMPA can be naturally extended to launch a fine-grained controllable attack, making it possible to precisely reduce the global accuracy. Armed with precise control, malicious FL service providers can gain advantages over their competitors without getting noticed, hence opening a new attack surface in FL other than DoS. Even for the purpose of DoS, experiments show that FMPA significantly decreases the global accuracy, outperforming six state-of-the-art attacks.

List of keywords

Machine Learning -> ML: Federated learning
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Machine Learning -> ML: Adversarial machine learning

4248

Topological Planning with Post-unique and Unary Actions

Guillaume Prévost, Stéphane CARDON, Tristan Cazenave, Christophe GUETTIER, Éric JACOPIN

[+] More

[-] Less

We are interested in realistic planning problems to model the behavior of Non-Playable Characters (NPCs) in video games. Search-based action planning, introduced by the game F.E.A.R. in 2005, has an exponential time complexity allowing to control only a dozen NPCs between 2 frames. A close study of the plans generated in first-person shooters shows that: (1) actions are unary, (2) actions are contextually post-unique and (3) there is no two instances of the same action in an NPC’s plan. By considering (1), (2) and (3) as restrictions, we introduce new classes of problems with the Simplified Action Structure formalism which indeed allow to model realistic problems and whose instances are solvable by a linear-time algorithm. We also experimentally show that our algorithm is capable of managing millions of NPCs per frame.

List of keywords

Planning and Scheduling -> PS: Planning algorithms
Multidisciplinary Topics and Applications -> MDA: Computer games
Planning and Scheduling -> PS: Real-time planning

4251

Unveiling Concepts Learned by a World-Class Chess-Playing Agent

Yngvi Bjornsson, Adalsteinn Palsson

[+] More

[-] Less

In recent years, the state-of-the-art agents for playing abstract board games, like chess and others, have moved from using intricate hand-crafted models for evaluating the merits of individual game states toward using neural networks (NNs). This development has eased the encapsulation of the relevant domain-specific knowledge and resulted in much-improved playing strength. However, this has come at the cost of making the resulting models ill-interpretable and challenging to understand and use for enhancing human knowledge. Using a world-class superhuman-strength chess-playing engine as our testbed, we show how recent model probing interpretability techniques can shed light on concepts learned by the engine’s NN. Furthermore, to gain additional insight, we contrast the game-state evaluations of the NN to that of its counterpart hand-crafted evaluation model and identify and explain some of the main differences.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Game playing
Machine Learning -> ML: Explainable/Interpretable machine learning
Search -> S: Game playing

4255

Multi-View Robust Graph Representation Learning for Graph Classification

Guanghui Ma, Chunming Hu, Ling Ge, Hong Zhang

[+] More

[-] Less

The robustness of graph classification models plays an essential role in providing highly reliable applications. Previous studies along this line primarily focus on seeking the stability of the model in terms of overall data metrics (e.g., accuracy) when facing data perturbations, such as removing edges. Empirically, we find that these graph classification models also suffer from semantic bias and confidence collapse issues, which substantially hinder their applicability in real-world scenarios. To address these issues, we present MGRL, a multi-view representation learning model for graph classification tasks that achieves robust results. Firstly, we proposes an instance-view consistency representation learning method, which utilizes multi-granularity contrastive learning technique to perform semantic constraints on instance representations at both the node and graph levels, thus alleviating the semantic bias issue. Secondly, we proposes a class-view discriminative representation learning method, which employs the prototype-driven class distance optimization technique to adjust intra- and inter-class distances, thereby mitigating the confidence collapse issue.Finally, extensive experiments and visualizations on eight benchmark dataset demonstrate the effectiveness of MGRL.

List of keywords

Machine Learning -> ML: Robustness
Machine Learning -> ML: Representation learning

4257

Automatics Truss Design with Reinforcement Learning

Weihua Du, Jinglun Zhao, Chao Yu, Xingcheng Yao, Zimeng Song, Siyang Wu, Ruifeng Luo, Zhiyuan Liu, Xianzhong Zhao, Yi Wu

[+] More

[-] Less

Truss layout design, namely finding a lightweight truss layout satisfying all the physical constraints, is a fundamental problem in the building industry. Generating the optimal layout is a challenging combinatorial optimization problem, which can be extremely expensive to solve by exhaustive search. Directly applying end-to-end reinforcement learning (RL) methods to truss layout design is infeasible either, since only a tiny portion of the entire layout space is valid under the physical constraints, leading to particularly sparse rewards for RL training. In this paper, we develop AutoTruss, a two-stage framework to efficiently generate both lightweight and valid truss layouts. AutoTruss first adopts Monte Carlo tree search to discover a diverse collection of valid layouts. Then RL is applied to iteratively refine the valid solutions. We conduct experiments and ablation studies in popular truss layout design test cases in both 2D and 3D settings. AutoTruss outperforms the best-reported layouts by 25.1% in the most challenging 3D test cases, resulting in the first effective deep-RL-based approach in the truss layout design literature.

List of keywords

Machine Learning -> ML: Applications
Machine Learning -> ML: Reinforcement learning
Search -> S: Combinatorial search and optimisation

4271

Algorithmics of Egalitarian versus Equitable Sequences of Committees

Eva Michelle Deltl, Till Fluschnik, Robert Bredereck

[+] More

[-] Less

We study the election of sequences of committees, where in each of tau levels (e.g. modeling points in time) a committee consisting of k candidates from a common set of m candidates is selected. For each level, each of n agents (voters) may nominate one candidate whose selection would satisfy her. We are interested in committees which are good with respect to the satisfaction per day and per agent. More precisely, we look for egalitarian or equitable committee sequences. While both guarantee that at least x agents per day are satisfied, egalitarian committee sequences ensure that each agent is satisfied in at least y levels while equitable committee sequences ensure that each agent is satisfied in exactly y levels. We analyze the parameterized complexity of finding such committees for the parameters n, m, k, tau, x, and y, as well as combinations thereof.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

4276

Analyzing Intentional Behavior in Autonomous Agents Under Uncertainty

Filip Cano Córdoba, Samuel Judson, Timos Antonopoulos, Katrine Bjørner, Nicholas Shoemaker, Scott Shapiro, Ruzica Piskac, Bettina Könighofer

[+] More

[-] Less

Principled accountability for autonomous decision making in uncertain environments requires distinguishing intentional outcomes from negligent designs from true accidents. We propose analyzing the behavior of autonomous agents through a quantitative measure of the evidence of intentional behavior. We model an uncertain environment as a Markov Decision Process (MDP). For a given scenario, we rely on probabilistic model checking to compute the ability of the agent to influence reaching a certain event. We call this the scope of agency. We say that there is evidence for intentional behavior if the scope of agency is high and the decisions of the agent are close to being optimal for reaching the event. Our method applies counterfactual reasoning to automatically generate relevant scenarios that can be analyzed to increase the confidence of our assessment. In a case study, we show how our method can distinguish between intentional and accidental traffic collisions.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Accountability
AI Ethics, Trust, Fairness -> ETF: Moral decision making
Planning and Scheduling -> PS: Markov decisions processes

4290

Measuring a Priori Voting Power in Liquid Democracy

Rachael Colley, Théo Delemazure, Hugo Gilbert

[+] More

[-] Less

We introduce new power indices to measure the a priori voting power of voters in liquid democracy elections where an underlying network restricts delegations. We argue that our power indices are natural extensions of the standard Penrose-Banzhaf index in simple voting games. We show that computing the criticality of a voter is \#P-hard even in weighted games with weights polynomially-bounded in the size of the instance. However, for specific settings, such as when the underlying network is a bipartite or complete graph, recursive formulas can compute these indices for weighted voting games in pseudo-polynomial time. We highlight their theoretical properties and provide numerical results to illustrate how restricting the possible delegations can alter voters’ voting power.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

4306

Unifying Core-Guided and Implicit Hitting Set based Optimization

Hannes Ihalainen, Jeremias Berg, Matti Järvisalo

[+] More

[-] Less

Two of the most central algorithmic paradigms implemented in practical solvers for maximum satisfiability (MaxSAT) and other related declarative paradigms for NP-hard combinatorial optimization are the core-guided (CG) and implicit hitting set (IHS) approaches. We develop a general unifying algorithmic framework, based on the recent notion of abstract cores, that captures both CG and IHS computations. The framework hence offers a unified way of establishing the correctness of variants of the approaches, and can be instantiated in novel ways giving rise to new algorithmic variants combining aspects of the core-guided and IHS approaches; we illustrate the latter aspect by developing a prototype implementation of an algorithm variant for MaxSAT based on the framework.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty

4309

Quantifying Consistency and Information Loss for Causal Abstraction Learning

Fabio Massimo Zennaro, Paolo Turrini, Theodoros Damoulas

[+] More

[-] Less

Structural causal models provide a formalism to express causal relations between variables of interest. Models and variables can represent a system at different levels of abstraction, whereby relations may be coarsened and refined according to the need of a modeller. However, switching between different levels of abstraction requires evaluating a trade-off between the consistency and the information loss among different models. In this paper we introduce a family of interventional measures that an agent may use to evaluate such a trade-off. We consider four measures suited for different tasks, analyze their properties, and propose algorithms to evaluate and learn causal abstractions. Finally, we illustrate the flexibility of our setup by empirically showing how different measures and algorithmic choices may lead to different abstractions.

List of keywords

Uncertainty in AI -> UAI: Causality, structural causal models and causal inference

4311

Hyperspectral Image Denoising Using Uncertainty-Aware Adjustor

JiaHua Xiao, Xing Wei

[+] More

[-] Less

Hyperspectral image (HSI) denoising has achieved promising results with the development of deep learning. A mainstream class of methods exploits the spatial-spectral correlations and recovers each band with the help of neighboring bands, collectively referred to as spectral auxiliary networks. However, these methods treat entire adjacent spectral bands equally. In theory, clearer and nearer bands carry more accurate spectral information than noisier and farther ones with higher uncertainties. How to achieve spectral adaptation and enhancement of each adjacent band has become an urgent problem in HSI denoising. This work introduces an uncertainty-aware adjustor (UA-Adjustor), which aims to adjust and enhance adjacent bands, enabling the spectral auxiliary networks to better restore noisy HSI while maintaining a balance between efficiency and performance. The workflow of UA-Adjustor can be divided into three stages: 1) Evaluating the importance of each adjacent band to the current denoising band. 2) Based on uncertainty perception, each pixel in adjacent bands is enhanced by aggregating their spatial content in a local spectral range. 3) The enhanced bands are adjusted with pixel-by-pixel uncertainty to produce the final spectral auxiliary bands. The proposed UA-Adjustor can be flexibly plugged into existing spectral auxiliary networks to improve denoising behavior at low cost. Extensive experimental results validate that the proposed solution can improve over recent state-of-the-art (SOTA) methods on both simulated and real-world benchmarks by a large margin. The source code will be available to the public.

List of keywords

Computer Vision -> CV: Applications
Computer Vision -> CV: Computational photography

4313

A Rule-Based Modal View of Causal Reasoning

Emiliano Lorini

[+] More

[-] Less

We present a novel rule-based semantics for causal reasoning as well as a number of modal languages interpreted over it. They enable us to represent some fundamental concepts in the theory of causality including causal necessity and possibility, interventionist conditionals and Lewisian conditionals. We provide complexity results for the satisfiability checking and model checking problem for these modal languages. Moreover, we study the relationship between our rule-based semantics and the structural equation modeling (SEM) approach to causal reasoning, as well as between our rule-based semantics for causal conditionals and the standard semantics for belief base change.

List of keywords

Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Knowledge Representation and Reasoning -> KRR: Causality
Knowledge Representation and Reasoning -> KRR: Reasoning about knowledge and belief

4322

FEAMOE: Fair, Explainable and Adaptive Mixture of Experts

Shubham Sharma, Jette Henderson, Joydeep Ghosh

[+] More

[-] Less

Three key properties that are desired of trustworthy machine learning models deployed in high-stakes environments are fairness, explainability, and an ability to account for various kinds of "drift". While drifts in model accuracy have been widely investigated, drifts in fairness metrics over time remain largely unexplored. In this paper, we propose FEAMOE, a novel "mixture-of-experts" inspired framework aimed at learning fairer, more interpretable models that can also rapidly adjust to drifts in both the accuracy and the fairness of a classifier. We illustrate our framework for three popular fairness measures and demonstrate how drift can be handled with respect to these fairness constraints. Experiments on multiple datasets show that our framework as applied to a mixture of linear experts is able to perform comparably to neural networks in terms of accuracy while producing fairer models. We then use the large-scale HMDA dataset and show that various models trained on HMDA demonstrate drift and FEAMOE can ably handle these drifts with respect to all the considered fairness measures and maintain model accuracy. We also prove that the proposed framework allows for producing fast Shapley value explanations, which makes computationally efficient feature attribution based explanations of model decisions readily available via FEAMOE.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Ethical, legal and societal issues
AI Ethics, Trust, Fairness -> ETF: Bias
AI Ethics, Trust, Fairness -> ETF: Other

4339

GIDnets: Generative Neural Networks for Solving Inverse Design Problems via Latent Space Exploration

Carlo Adornetto, Gianluigi Greco

[+] More

[-] Less

In a number of different fields, including Engeneering, Chemistry and Physics, the design of technological tools and device structures is increasingly supported by deep-learning based methods, which provide suggestions on crucial architectural choices based on the properties that these tools and structures should exhibit. The paper proposes a novel network architecture, named GIDnet, to address this so-called inverse design problem, which is based on exploring a suitably defined latent space associated with all possible designs. Among its distinguishing features, GIDnet is capable of identifying the most appropriate starting point for the exploration and of likely converging into a point corresponding to a design that is a feasible one. Results of a thorough experimental activity conducted—in particular—over a real photonic application evidence that GIDnet outperforms earlier approaches in the literature.

List of keywords

Machine Learning -> ML: Experimental methodology
Machine Learning -> ML: Applications
Machine Learning -> ML: Autoencoders

4350

Musical Voice Separation as Link Prediction: Modeling a Musical Perception Task as a Multi-Trajectory Tracking Problem

Emmanouil Karystinaios, Francesco Foscarin, Gerhard Widmer

[+] More

[-] Less

This paper targets the perceptual task of separating the different interacting voices, i.e., monophonic melodic streams, in a polyphonic musical piece. We target symbolic music, where notes are explicitly encoded, and model this task as a Multi-Trajectory Tracking (MTT) problem from discrete observations, i.e., notes in a pitch-time space. Our approach builds a graph from a musical piece, by creating one node for every note, and separates the melodic trajectories by predicting a link between two notes if they are consecutive in the same voice/stream. This kind of local, greedy prediction is made possible by node embeddings created by a heterogeneous graph neural network that can capture inter- and intra-trajectory information. Furthermore, we propose a new regularization loss that encourages the output to respect the MTT premise of at most one incoming and one outgoing link for every node, favoring monophonic (voice) trajectories; this loss function might also be useful in other general MTT scenarios. Our approach does not use domain-specific heuristics, is scalable to longer sequences and a higher number of voices, and can handle complex cases such as voice inversions and overlaps. We reach new state-of-the-art results for the voice separation task on classical music of different styles.

List of keywords

Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Applications
Multidisciplinary Topics and Applications -> MDA: Arts and creativity

4351

Choose your Data Wisely: A Framework for Semantic Counterfactuals

Konstantinos Thomas, Edmund Dervakos, Giorgos Filandrianos, Giorgos Stamou

[+] More

[-] Less

Counterfactual explanations have been argued to be one of the most intuitive forms of explanation. They are typically defined as a minimal set of edits on a given data sample that, when applied, changes the output of a model on that sample. However, a minimal set of edits is not always clear and understandable to an end-user, as it could constitute an adversarial example (which is indistinguishable from the original data sample to an end-user). Instead, there are recent ideas that the notion of minimality in the context of counterfactuals should refer to the semantics of the data sample, and not to the feature space. In this work, we build on these ideas, and propose a framework that provides counterfactual explanations in terms of knowledge graphs. We provide an algorithm for computing such explanations (given some assumptions about the underlying knowledge), and quantitatively evaluate the framework with a user study.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Knowledge Representation and Reasoning -> KRR: Applications
Machine Learning -> ML: Explainable/Interpretable machine learning

4356

Front-to-End Bidirectional Heuristic Search with Consistent Heuristics: Enumerating and Evaluating Algorithms and Bounds

Lior Siag, Shahaf Shperberg, Ariel Felner, Nathan Sturtevant

[+] More

[-] Less

Recent research on bidirectional heuristic search (BiHS) is based on the {\em must-expand pairs} theory (MEP theory), which describes which pairs of nodes must be expanded during the search to guarantee the optimality of solutions. A separate line of research in BiHS has proposed algorithms that use lower bounds that are derived from consistent heuristics during search. This paper links these two directions, providing a comprehensive unifying view and showing that both existing and novel algorithms can be derived from the MEP theory. An exhaustive set of all possible bounds is formulated, showing that any other bound will be dominated by the bounds in this set. Finally, the bounds are empirically evaluated by their individual and joint contribution to the efficiency of the search.

List of keywords

Search -> S: Heuristic search

4367

Measuring and Controlling Divisiveness in Rank Aggregation

Rachael Colley, Umberto Grandi, Cesar Hidalgo, Mariana Macedo, Carlos Navarrete

[+] More

[-] Less

In rank aggregation, members of a population rank issues to decide which are collectively preferred. We focus instead on identifying divisive issues that express disagreements among the preferences of individuals. We analyse the properties of these divisiveness measures and their relation to existing notions of polarisation. We also study their robustness under incomplete preferences and algorithms for control and manipulation of divisiveness. Our results advance our understanding of how to quantify disagreements in collective decision-making.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Multidisciplinary Topics and Applications -> MDA: Social sciences

4371

Curriculum Multi-Level Learning for Imbalanced Live-Stream Recommendation

Shuodian Yu, Junqi Jin, Li Ma, Xiaofeng Gao, Xiaopeng WU, Haiyang Xu, Jian Xu

[+] More

[-] Less

In large-scale live-stream recommendation, streamers are classified into different levels based on their popularity and other metrics for marketing. Several top streamers at the head level occupy a considerable amount of exposure, resulting in an unbalanced data distribution. A unified model for all levels without consideration of imbalance issue can be biased towards head streamers and neglect the conflicts between levels. The lack of inter-level streamer correlations and intra-level streamer characteristics modeling imposes obstacles to estimating the user behaviors. To tackle these challenges, we propose a curriculum multi-level learning framework for imbalanced recommendation. We separate model parameters into shared and level-specific ones to explore the generality among all levels and discrepancy for each level respectively. The level-aware gradient descent and a curriculum sampling scheduler are designed to capture the de-biased commonalities from all levels as the shared parameters. During the specific parameters training, the hardness-aware learning rate and an adaptor are proposed to dynamically balance the training process. Finally, shared and specific parameters are combined to be the final model weights and learned in a cooperative training framework. Extensive experiments on a live-stream production dataset demonstrate the superiority of the proposed framework.

List of keywords

Data Mining -> DM: Recommender systems
Machine Learning -> ML: Multi-task and transfer learning

4372

Keep Skills in Mind: Understanding and Implementing Skills in Commonsense Question Answering

Meikai Bao, Kai Zhang, Ye Liu, Linan Yue, Qi Liu, longfei li, Jun Zhou

[+] More

[-] Less

Commonsense Question Answering (CQA) aims to answering questions that require human commonsense. Closed-book CQA, as one of the subtasks, requires the model to answer questions without retrieving external knowledge, which emphasizes the importance of the model’s problem-solving ability. Most previous methods relied on large-scale pre-trained models to generate question-related knowledge while ignoring the crucial role of skills in answering commonsense questions. Generally, skills refer to the learned ability in performing a specific task or activity, which are derived from knowledge and experience. In this paper, we introduce a new approach named Dynamic Skill-aware Commonsense Question Answering (DSCQA), which transcends the limitations of traditional methods by informing the model about the need for each skill in questions and utilizes skills as a critical driver in CQA process. To be specific, DSCQA first employs commonsense skill extraction module to generate various skill representations. Then, DSCQA utilizes dynamic skill module to generate dynamic skill representations. Finally, in perception and emphasis module, various skill and dynamic skill representation are used to help question-answering process. Experimental results on two publicly available CQA datasets show the effectiveness of our proposed model and the considerable impact of introducing skills.

List of keywords

Natural Language Processing -> NLP: Question answering
Knowledge Representation and Reasoning -> KRR: Common-sense reasoning

4376

Graph-based Semi-supervised Local Clustering with Few Labeled Nodes

Zhaiming Shen, Ming-Jun Lai, Sheng Li

[+] More

[-] Less

Local clustering aims at extracting a small local structure inside a graph without the necessity of knowing the entire graph structure. As the local structure is usually small in size compared to the entire graph, one can think of it as a compressive sensing problem where the indices of target cluster can be thought as a sparse solution to a linear system. In this paper, we apply this idea based on two pioneering works under the same framework and propose a new semi-supervised local clustering approach using only few labeled nodes. Our approach improves the existing works by making the initial cut to be the entire graph and hence overcomes a major limitation of the existing works, which is the low quality of initial cut. Extensive experimental results on multiple benchmark datasets demonstrate the effectiveness of our approach.

List of keywords

Machine Learning -> ML: Clustering
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Semi-supervised learning

4381

On the Compilability of Bounded Numeric Planning

Nicola Gigante, Enrico Scala

[+] More

[-] Less

Bounded numeric planning, where each numeric variable domain is bounded, is PSPACE-complete, but such a complexity result does not capture how hard it really is, since the same holds even for the practically much easier STRIPS fragment. A finer way to compare the difficulty of planning formalisms is through the notion of compilability, which has been however extensively studied only for classical planning by Nebel. This paper extends Nebel’s framework to the setting of bounded numeric planning. First, we identify a variety of numeric fragments differing on the degree of the polynomials involved and the availability of features such as conditional effects and Boolean conditions; then we study the compilability of these fragments to each other and to the classical fragments. Surprisingly, numeric and classical planning with conditional effects and Boolean conditions can be compiled both ways preserving plan size exactly, while the same does not hold when targeting pure STRIPS. Our study reveals also that numeric fragments cluster into two equivalence classes separated by the availability of incomplete initial state specifications, a feature allowing to specify uncertainty in the initial state.

List of keywords

Planning and Scheduling -> PS: Theoretical foundations of planning
Knowledge Representation and Reasoning -> KRR: Knowledge compilation

4383

Efficient and Equitable Deployment of Mobile Vaccine Distribution Centers

Da Qi Chen, Ann Li, George Li, Madhav Marathe, Aravind Srinivasan, Leonidas Tsepenekas, Anil Vullikanti

[+] More

[-] Less

Vaccines are extremely effective and preventing the spread of COVID-19 and continues to be the primary method to ending the pandemic in its current form. Lack of access was one of the reasons why many people didn’t get vaccinated early. As a result, states such as Virginia have deployed \emph{mobile} vaccination sites in order to distribute vaccines across the county. Here we study the problem of deciding where these facilities should be placed and moved over time in order to minimize the distance each person needs to travel in order to be vaccinated. Traditional facility location models for this problem fail to incorporate the fact that our facilities are mobile (i.e., they can move over time). To this end, we instead model vaccine distribution as the Dynamic $k$-Supplier problem and give the first approximation algorithms for this problem. We then run extensive simulations on real world datasets to show the efficacy of our methods. In particular, we find that natural baselines for Dynamic $k$-Supplier cannot take advantage of the mobility of the facilities, and perform worse than standard $k$-Supplier algorithms.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Resource allocation
Machine Learning -> ML: Clustering
Search -> S: Combinatorial search and optimisation

4389

Differentially Private Partial Set Cover with Applications to Facility Location

George Li, Dung Nguyen, Anil Vullikanti

[+] More

[-] Less

Set Cover is a fundamental problem in combinatorial optimization which has been studied for many decades due to its various applications across multiple domains. In many of these domains, the input data consists of locations, relationships, and other sensitive information of individuals which may leaked due to the set cover output. Attempts have been made to design privacy-preserving algorithms to solve the Set Cover under privacy constraints. Under differential privacy, it was proved by~\cite{gupta2009differentially} that the Set Cover problem has strong impossibility results and no explicit forms of the output can be released to the public. In this work, we observe that these hardness results dissolve when we turn to the Partial Set Cover problem, where we only need to cover a $\rho\in(0,1)$ fraction of the elements. We show that this relaxation enables us to avoid the impossibility results, and give the first algorithm which outputs an explicit form of set cover with non-trivial utility guarantees under differential privacy. Using our algorithm as a subroutine, we design a differentially private bicriteria algorithm to solve a recently proposed facility location problem for vaccine distribution which generalizes the $k$-supplier with outliers. Our analysis shows that relaxing the covering requirement to serve only a $\rho\in(0,1)$ fraction of the population/universe also allows us to circumvent the inherent hardness of $k$-supplier and give the first non-trivial guarantees.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Security and privacy
Data Mining -> DM: Privacy-preserving data mining

4391

Self-Recover: Forecasting Block Maxima in Time Series from Predictors with Disparate Temporal Coverage using Self-Supervised Learning

Asadullah Hill Galib, Andrew McDonald, Pang-Ning Tan, Lifeng Luo

[+] More

[-] Less

Forecasting the maximum value of a future time window is a challenging problem as it is difficult to infer the tail distribution of a random variable. For non-stationary time series, the historical observations alone are often insufficient to train robust models for predicting the block maxima. Domain-driven process-based models are often needed to supplement historical observation data in order to improve the forecast accuracy. Unfortunately, coupling the historical observations with process-based model outputs is a challenge due to their disparate temporal coverage. This paper presents Self-Recover, a deep learning framework to predict the block maxima of a time series by employing self-supervised learning to address the varying temporal data availability problem. Self-Recover uses a combination of contrastive and generative self-supervised schemes along with a denoising autoencoder component for imputing missing values. It combines a representation of the historical observations with a representation of the process model outputs using a residual learning approach while incorporating the generalized extreme value (GEV) distribution to ensure consistency of its block maxima predictions. Extensive experiments on real-world datasets demonstrate the superiority of Self-Recover in block maxima prediction compared to other state-of-the-art approaches.

List of keywords

Machine Learning -> ML: Time series and data streams
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Self-supervised Learning

4401

A Logic-based Approach to Contrastive Explainability for Neurosymbolic Visual Question Answering

Thomas Eiter, Tobias Geibinger, Nelson Higuera, Johannes Oetsch

[+] More

[-] Less

Visual Question Answering (VQA) is a well-known problem for which deep-learning is key. However, this poses a challenge for explaining answers to questions, the more if advanced notions like contrastive explanations (CEs) should be provided. The latter explain why an answer has been reached in contrast to a different one and are attractive as they focus on reasons necessary to flip a query answer. We present a CE framework for VQA that uses a neurosymbolic VQA architecture which disentangles perception from reasoning. Once the reasoning part is provided as logical theory, we use answer-set programming, in which CE generation can be framed as an abduction problem. We validate our approach on the CLEVR dataset, which we extend by more sophisticated questions to further demonstrate the robustness of the modular architecture. While we achieve top performance in terms of accuracy compared to related approaches, we can also produce CEs which we illustrate for explanation, model debugging, and validation, showing the versatility of the declarative approach to reasoning.

List of keywords

Machine Learning -> ML: Neuro-symbolic methods
Knowledge Representation and Reasoning -> KRR: Logic programming
Machine Learning -> ML: Explainable/Interpretable machine learning

4402

Max Markov Chain

Yu Zhang, Mitchell Bucklew

[+] More

[-] Less

In this paper, we introduce Max Markov Chain (MMC), a novel approximate model of High-order Markov Chain (HMC) for sequential data with sparse correlations among the states. MMC is desirable for domains where these sparse correlations are long-term but vary in their temporal stretches. We show that parameter optimization for MMC can be solved analytically although is generally intractable. However, based on this result, we derive an approximate solution that is highly efficient empirically. When compared with HMC and approximate HMC models, MMC combines better sample efficiency, model parsimony, and an outstanding computational advantage. Such a quality allows MMC to scale to large domains where the competing models would struggle to perform. We compare MMC with several baselines with synthetic and real-world datasets to demonstrate MMC as a valuable alternative for stochastic modeling.

List of keywords

Uncertainty in AI -> UAI: Bayesian networks
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
Uncertainty in AI -> UAI: Tractable probabilistic models

4407

Scalable Optimal Margin Distribution Machine

Yilin Wang, Nan Cao, Teng Zhang, Xuanhua Shi, Hai Jin

[+] More

[-] Less

Optimal margin Distribution Machine (ODM) is a newly proposed statistical learning framework rooting in the novel margin theory, which demonstrates better generalization performance than the traditional large margin based counterparts. Nonetheless, it suffers from the ubiquitous scalability problem regarding both computation time and memory as other kernel methods. This paper proposes a scalable ODM, which can achieve nearly ten times speedup compared to the original ODM training method. For nonlinear kernels, we propose a novel distribution-aware partition method to make the local ODM trained on each partition be close and converge faster to the global one. When linear kernel is applied, we extend a communication efficient SVRG method to accelerate the training further. Extensive empirical studies validate that our proposed method is highly computational efficient and almost never worsen the generalization.

List of keywords

Machine Learning -> ML: Classification
Data Mining -> DM: Big data and scalability
Machine Learning -> ML: Kernel methods

4413

Sequential Attention Source Identification based on Feature Representation

Dongpeng Hou, Chao Gao, Zhen Wang, Xuelong Li

[+] More

[-] Less

Snapshot observation based source localization has been widely studied due to its accessibility and low cost. However, the interaction of users in existing methods does not be addressed in time-varying infection scenarios. So these methods have a decreased accuracy in heterogeneous interaction scenarios. To solve this critical issue, this paper proposes a sequence-to-sequence based localization framework called Temporal-sequence based Graph Attention Source Identification (TGASI) based on an inductive learning idea. More specifically, the encoder focuses on aggregating multiple features by estimating the influence probability between two users, and the decoder distinguishes the importance of prediction sources in different timestamps by a designed temporal attention mechanism. It’s worth mentioning that the inductive learning idea ensures that TGASI can detect the sources in new scenarios without knowing other prior knowledge, which proves the scalability of TGASI. Comprehensive experiments with the SOTA methods demonstrate the higher detection performance, designed module necessity, and scalability in different scenarios of TGASI.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Web and social networks
Data Mining -> DM: Networks
Machine Learning -> ML: Sequence and graph learning

4418

Relative Inconsistency Measures for Indefinite Databases with Denial Constraints

Francesco Parisi, John Grant

[+] More

[-] Less

Handling conflicting information is an important challenge in AI. Measuring inconsistency is an approach that provides ways to quantify the severity of inconsistency and helps understanding the primary sources of conflicts. In particular, a relative inconsistency measure computes, by some criteria, the proportion of the knowledge base that is inconsistent. In this paper we investigate relative inconsistency measures for indefinite databases, which allow for indefinite or partial information which is formally expressed by means of disjunctive tuples. We introduce a postulate-based definition of relative inconsistency measure for indefinite databases with denial constraints, and investigate the compliance of some relative inconsistency measures with rationality postulates for indefinite databases as well as for the special case of definite databases. Finally, we investigate the complexity of the problem of computing the value of the proposed relative inconsistency measures as well as of the problems of deciding whether the inconsistency value is lower than, greater than, or equal to a given threshold for indefinite and definite databases.

List of keywords

Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning

4419

Learning to Act for Perceiving in Partially Unknown Environments

Leonardo Lamanna, Mohamadreza Faridghasemnia, Alfonso Gerevini, Alessandro Saetti, Alessandro Saffiotti, Luciano Serafini, Paolo Traverso

[+] More

[-] Less

Autonomous agents embedded in a physical environment need the ability to correctly perceive the state of the environment from sensory data. In partially observable environments, certain properties can be perceived only in specific situations and from certain viewpoints that can be reached by the agent by planning and executing actions. For instance, to understand whether a cup is full of coffee, an agent, equipped with a camera, needs to turn on the light and look at the cup from the top. When the proper situations to perceive the desired properties are unknown, an agent needs to learn them and plan to get in such situations. In this paper, we devise a general method to solve this problem by evaluating the confidence of a neural network online and by using symbolic planning. We experimentally evaluate the proposed approach on several synthetic datasets, and show the feasibility of our approach in a real-world scenario that involves noisy perceptions and noisy actions on a real robot.

List of keywords

Robotics -> ROB: Cognitive robotics
Planning and Scheduling -> PS: Learning in planning and scheduling
Robotics -> ROB: Perception

4423

Computing twin-width with SAT and branch & bound

Andre Schidler, Stefan Szeider

[+] More

[-] Less

The graph width-measure twin-width recently attracted great attention because of its solving power and generality. Many prominent NP-hard problems are tractable on graphs of bounded twin-width if a certificate for the twin-width bound is provided as an input. Bounded twin-width subsumes other prominent structural restrictions such as bounded treewidth and bounded rank-width. Computing such a certificate is NP-hard itself, already for twin-width 3, and the only known implemented algorithm for twin-width computation is based on a SAT encoding. In this paper, we propose two new algorithmic approaches for computing twin-width that significantly improve the state of the art. Firstly, we develop a SAT encoding that is far more compact than the known encoding and consequently scales to larger graphs. Secondly, we propose a new Branch & Bound algorithm for twin-width that, on many graphs, is significantly faster than the SAT encoding. It utilizes a sophisticated caching system for partial solutions. Both algorithmic approaches are based on new conceptual insights into twin-width computation, including the reordering of contractions.

List of keywords

Constraint Satisfaction and Optimization -> CSO: Constraint programming
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction

4428

GTR: A Grafting-Then-Reassembling Framework for Dynamic Scene Graph Generation

Jiafeng Liang, Yuxin Wang, Zekun Wang, Ming Liu, Ruiji Fu, Zhongyuan Wang, Bing Qin

[+] More

[-] Less

Dynamic scene graph generation aims to identify visual relationships <subject, predicate, object> in frames based on spatio-temporal contextual information in the video. Previous work implicitly models the spatio-temporal interaction simultaneously, which leads to entanglement of spatio-temporal contextual information. To this end, we propose a Grafting-Then-Reassembling framework (GTR), which explicitly extracts intra-frame spatial information and inter-frame temporal information in two separate stages to decouple spatio-temporal contextual information. Specifically, we first graft a static scene graph generation model to generate static visual relationships within frames. Then we propose the temporal dependency model to extract the temporal dependencies across frames, and explicitly reassemble static visual relationships into dynamic scene graphs. Experimental results show that GTR achieves the state-of-the-art performance on Action Genome dataset. Further analyses reveal that the reassembling stage is crucial to the success of our framework.

List of keywords

Computer Vision -> CV: Video analysis and understanding
Natural Language Processing -> NLP: Information extraction

4431

Preferences and Constraints in Abstract Argumentation

Gianvincenzo Alfano, Sergio Greco, Francesco Parisi, Irina Trubitsyna

[+] More

[-] Less

In recent years there has been an increasing interest in extending Dung’s framework to facilitate the knowledge representation and reasoning process. In this paper, we present an extension of Abstract Argumentation Framework (AF) that allows for the representation of preferences over arguments’ truth values (3-valued preferences). For instance, we can express a preference stating that extensions where argument ‘a’ is false (i.e. defeated) are preferred to extensions where argument ‘b’ is false. Interestingly, such a framework generalizes the well-known Preference-based AF with no additional cost in terms of computational complexity for most of the classical argumentation semantics. Then, we further extend AF by considering both (3-valued) preferences and 3-valued constraints, that is constraints of the form \varphi \Rightarrow v or v \Rightarrow \varphi, where \varphi is a logical formula and v is a 3-valued truth value. After investigating the complexity of the resulting framework, as both constraints and preferences may represent subjective knowledge of agents, we extend our framework by considering multiple agents and study the complexity of deciding acceptance of arguments in this context.

List of keywords

Knowledge Representation and Reasoning -> KRR: Argumentation

4438

Probabilistic Rule Induction from Event Sequences with Logical Summary Markov Models

Debarun Bhattacharjya, Oktie Hassanzadeh, Ronny Luss, Keerthiram Murugesan

[+] More

[-] Less

Event sequences are widely available across application domains and there is a long history of models for representing and analyzing such datasets. Summary Markov models are a recent addition to the literature that help identify the subset of event types that influence event types of interest to a user. In this paper, we introduce logical summary Markov models, which are a family of models for event sequences that enable interpretable predictions through logical rules that relate historical predicates to the probability of observing an event type at any arbitrary position in the sequence. We illustrate their connection to prior parametric summary Markov models as well as probabilistic logic programs, and propose new models from this family along with efficient greedy search algorithms for learning them from data. The proposed models outperform relevant baselines on most datasets in an empirical investigation on a probabilistic prediction task. We also compare the number of influencers that various logical summary Markov models learn on real-world datasets, and conduct a brief exploratory qualitative study to gauge the promise of such symbolic models around guiding large language models for predicting societal events.

List of keywords

Uncertainty in AI -> UAI: Tractable probabilistic models
Data Mining -> DM: Mining spatial and/or temporal data
Machine Learning -> ML: Time series and data streams

4441

Multi-Robot Coordination and Layout Design for Automated Warehousing

Yulun Zhang, Matthew Fontaine, Varun Bhatt, Stefanos Nikolaidis, Jiaoyang Li

[+] More

[-] Less

With the rapid progress in Multi-Agent Path Finding (MAPF), researchers have studied how MAPF algorithms can be deployed to coordinate hundreds of robots in large automated warehouses. While most works try to improve the throughput of such warehouses by developing better MAPF algorithms, we focus on improving the throughput by optimizing the warehouse layout. We show that, even with state-of-the-art MAPF algorithms, commonly used human-designed layouts can lead to congestion for warehouses with large numbers of robots and thus have limited scalability. We extend existing automatic scenario generation methods to optimize warehouse layouts. Results show that our optimized warehouse layouts (1) reduce traffic congestion and thus improve throughput, (2) improve the scalability of the automated warehouses by doubling the number of robots in some cases, and (3) are capable of generating layouts with user-specified diversity measures.

List of keywords

Robotics -> ROB: Multi-robot systems
Planning and Scheduling -> PS: Applications
Search -> S: Evolutionary computation

4454

Fair and Efficient Allocation of Indivisible Chores with Surplus

Hannaneh Akrami, Bhaskar Ray Chaudhury, Jugal Garg, Kurt Mehlhorn, Ruta Mehta

[+] More

[-] Less

We study fair division of indivisible chores among n agents with additive disutility functions. Two well-studied fairness notions for indivisible items are envy-freeness up to one/any item (EF1/EFX) and the standard notion of economic efficiency is Pareto optimality (PO). There is a noticeable gap between the results known for both EF1 and EFX in the goods and chores settings. The case of chores turns out to be much more challenging. We reduce this gap by providing slightly relaxed versions of the known results on goods for the chores setting. Interestingly, our algorithms run in polynomial time, unlike their analogous versions in the goods setting. We introduce the concept of k surplus in the chores setting which means that up to k more chores are allocated to the agents and each of them is a copy of an original chore. We present a polynomial-time algorithm which gives EF1 and PO allocations with n-1 surplus. We relax the notion of EFX slightly and define tEFX which requires that the envy from agent i to agent j is removed upon the transfer of any chore from the i’s bundle to j’s bundle. We give a polynomial-time algorithm that in the chores case for 3 agents returns an allocation which is either proportional or tEFX. Note that proportionality is a very strong criterion in the case of indivisible items, and hence both notions we guarantee are desirable.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Agent-based and Multi-agent Systems -> MAS: Resource allocation

4461

Efficient Computation of General Modules for ALC Ontologies

Hui Yang, Patrick Koopmann, Yue Ma, Nicole Bidoit

[+] More

[-] Less

We present a method for extracting general modules for ontologies formulated in the description logic ALC. A module for an ontology is an ideally substantially smaller ontology that preserves all entailments for a user-specified set of terms. As such, it has applications such as ontology reuse and ontology analysis. Different from classical modules, general modules may use axioms not explicitly present in the input ontology, which allows for additional conciseness. So far, general modules have only been investigated for lightweight description logics. We present the first work that considers the more expressive description logic ALC. In particular, our contribution is a new method based on uniform interpolation supported by some new theoretical results. Our evaluation indicates that our general modules are often smaller than classical modules and uniform interpolants computed by the state-of-the-art, and compared with uniform interpolants, can be computed in significantly shorter time. Moreover, our method can be used for, and in fact, improves the computation of uniform interpolants and classical modules.

List of keywords

Knowledge Representation and Reasoning -> KRR: Description logics and ontologies

4473

Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

Xinyan Zu, Haiyang Yu, Bin Li, Xiangyang Xue

[+] More

[-] Less

Video Text Spotting (VTS) aims at extracting text from videos. Collaborative Text Detection, Tracking, and Recognition are required to fulfill this task. Prior works generally tackle VTS with a strong sophisticated pipeline. However, they ignored the rich semantic content among text instances in the videos. Specifically, we observed that text instances within one video frame usually exist in the same natural scene, thus sharing similar semantic meanings. If predicted wrong by visual models, a text instance now has a chance to be made correct via semantic reasoning. In this paper, we proposed an accurate video text spotter (VLSpotter) that reads text Visually, Linguistically, and Semantically: Visually, we propose a plug-and-play text-focused super-resolution module (TFSR) to effectively reduce motion blur and enhance video quality. Linguistically, a language model is designed to subsequently process the visual recognition results to fit with natural language. Semantically, we propose a text-wise semantic reasoning module (TWSR) to model the semantic dependencies among text instances and reason for better results. Extensive experimental results on multiple VTS benchmarks confirm that the proposed VLSpotter observably outperformed existing state-of-the-art methods in end-to-end spotting.

List of keywords

Computer Vision -> CV: Vision and language
Computer Vision -> CV: Video analysis and understanding

4476

Discrete Two Player All-Pay Auction with Complete Information

Marcin Dziubiński, Krzysztof Jahn

[+] More

[-] Less

We study discrete two player all-pay auction with complete information. We provide full characterization of mixed strategy Nash equilibria and show that they constitute a subset of Nash equilibria of discrete General Lotto game. We show that equilibria are not unique in general but they are interchangeable and sets of equilibrium strategies are convex. We also show that equilibrium payoffs are unique, unless valuation of at least one of the players is an even integer number. If equilibrium payoffs are not unique, continuum of equilibrium payoffs are possible.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Noncooperative games

4480

Adversarial Contention Resolution Games

Giorgos Chionas, Bogdan Chlebus, Dariusz Kowalski, Piotr Krysta

[+] More

[-] Less

We study contention resolution on a shared channel modelled as a game with selfish players. There are $n$ agents attached to the channel and the adversary chooses some $k \leq n$ of them as players. Each agent participating as a player in a contention resolution game has a packet to transmit. A transmission is successful when it is performed as the only one at a round. Each player aims to minimize its packet latency. We introduce the notion of adversarial equilibrium, which incorporates adversarial selection of players. We develop efficient deterministic communication algorithms that are also adversarial equilibria. We estimate the price of anarchy in the contention resolution games with respect to adversarial equilibria.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Game Theory and Economic Paradigms -> GTEP: Mechanism design

4482

Fair Division of a Graph into Compact Bundles

Jayakrishnan Madathil

[+] More

[-] Less

We study the computational complexity of fair division of indivisible goods in an enriched model: there is an underlying graph on the set of goods. And we have to allocate the goods (i.e., the vertices of the graph) to a set of agents in such a way that (a) the allocation is fair (for appropriate notions of fairness) and (b) each agent receives a bundle of goods (i.e., a subset of vertices) that induces a subgraph with a “nice structure.” This model has previously been studied in the literature with the nice structure being a connected subgraph [Bouveret et al., IJCAI 2017 and Bei et al. AAAI 2021]. While connectivity of the bundles may be a desirable property, it is hardly the only such property when it comes to the fair division of graphs. In many cases, demanding connected bundles for every agent may even be too stringent a requirement. In this paper, we investigate an alternative for connectivity in fair division. Inspired by the definition of a compact topological space, we introduce a notion of compact graphs, and look for fair allocations in which each agent receives a compact bundle of goods. Through compactness, we attempt to capture the idea that every agent must receive a few “clusters” of goods, where a cluster is a set of goods that are close to one another, for appropriate notions of closeness. We argue that compactness is a sufficiently expansive concept that it includes several natural requirements on the structure of bundles. The hardness of fair division problems in the general setting (i.e., without the graph on the set of goods) translate to our model as well. Nonetheless, with respect to fairness concepts such as proportionality, envy-freeness and maximin share guarantee, we prove a host of hardness and tractability results for various special cases of the input.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Fair division
Agent-based and Multi-agent Systems -> MAS: Resource allocation

4484

On a Voter Model with Context-Dependent Opinion Adoption

Luca Becchetti, Vincenzo Bonifaci, Emilio Cruciani, Francesco Pasquale

[+] More

[-] Less

Opinion diffusion is a crucial phenomenon in social networks, often underlying the way in which a collective of agents develops a consensus on relevant decisions. The voter model is a well-known theoretical model to study opinion spreading in social networks and structured populations. Its simplest version assumes that an updating agent will adopt the opinion of a neighboring agent chosen at random. The model allows us to study, for example, the probability that a certain opinion will fixate into a consensus opinion, as well as the expected time it takes for a consensus opinion to emerge. Standard voter models are oblivious to the opinions held by the agents involved in the opinion adoption process. We propose and study a context-dependent opinion spreading process on an arbitrary social graph, in which the probability that an agent abandons opinion $a$ in favor of opinion $b$ depends on both $a$ and $b$. We discuss the relations of the model with existing voter models and then derive theoretical results for both the fixation probability and the expected consensus time for two opinions, for both the synchronous and the asynchronous update models.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation

4492

Incentivizing Recourse through Auditing in Strategic Classification

Andrew Estornell, Yatong Chen, Sanmay Das, Yang Liu, Yevgeniy Vorobeychik

[+] More

[-] Less

The increasing automation of high-stakes decisions with direct impact on the lives and well-being of individuals raises a number of important considerations. Prominent among these is strategic behavior by individuals hoping to achieve a more desirable outcome. Two forms of such behavior are commonly studied: 1) misreporting of individual attributes, and 2) recourse, or actions that truly change such attributes. The former involves deception, and is inherently undesirable, whereas the latter may well be a desirable goal insofar as it changes true individual qualification. We study misreporting and recourse as strategic choices by individuals within a unified framework. In particular, we propose auditing as a means to incentivize recourse actions over attribute manipulation, and characterize optimal audit policies for two types of principals, utility-maximizing and recourse-maximizing. Additionally, we consider subsidies as an incentive for recourse over manipulation, and show that even a utility-maximizing principal would be willing to devote a considerable amount of audit budget to providing such subsidies. Finally, we consider the problem of optimizing fines for failed audits, and bound the total cost incurred by the population as a result of audits.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Societal impact of AI
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Game Theory and Economic Paradigms -> GTEP: Other

4504

Explaining Answer-Set Programs with Abstract Constraint Atoms

Tobias Geibinger, Thomas Eiter

[+] More

[-] Less

Answer-Set Programming (ASP) is a popular declarative reasoning and problem solving formalism. Due to the increasing interest in explainabilty, several explanation approaches have been developed for ASP. However, support for commonly used advanced language features of ASP, as for example aggregates or choice rules, is still mostly lacking. We deal with explaining ASP programs containing Abstract Constraint Atoms, which encompass the above features and others. We provide justifications for the presence, or absence, of an atom in a given answer-set. To this end, we introduce several formal notions of justification in this setting based on the one hand on a semantic characterisation utilising minimal partial models, and on the other hand on a more ruled-guided approach. We provide complexity results for checking and computing such justifications, and discuss how the semantic and syntactic approaches relate and can be jointly used to offer more insight. Our results contribute to a basis for explaining commonly used language features and thus increase accessibility and usability of ASP as an AI tool.

List of keywords

Knowledge Representation and Reasoning -> KRR: Logic programming
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning

4505

Sequence Learning using Equilibrium Propagation

Malyaban Bal, Abhronil Sengupta

[+] More

[-] Less

Equilibrium Propagation (EP) is a powerful and more bio-plausible alternative to conventional learning frameworks such as backpropagation. The effectiveness of EP stems from the fact that it relies only on local computations and requires solely one kind of computational unit during both of its training phases, thereby enabling greater applicability in domains such as bio-inspired neuromorphic computing. The dynamics of the model in EP is governed by an energy function and the internal states of the model consequently converge to a steady state following the state transition rules defined by the same. However, by definition, EP requires the input to the model (a convergent RNN) to be static in both the phases of training. Thus it is not possible to design a model for sequence classification using EP with an LSTM or GRU like architecture. In this paper, we leverage recent developments in modern hopfield networks to further understand energy based models and develop solutions for complex sequence classification tasks using EP while satisfying its convergence criteria and maintaining its theoretical similarities with recurrent backpropagation. We explore the possibility of integrating modern hopfield networks as an attention mechanism with convergent RNN models used in EP, thereby extending its applicability for the first time on two different sequence classification tasks in natural language processing viz. sentiment analysis (IMDB dataset) and natural language inference (SNLI dataset).

List of keywords

Humans and AI -> HAI: Cognitive systems
Machine Learning -> ML: Attention models

4516

A Comparative Study of Ranking Formulas based on Consistency

Badran Raddaoui, Christian Straßer, Said Jabbour

[+] More

[-] Less

Ranking is ubiquitous in every day life. This paper is concerned with the problem of ranking information of a knowledge base when this latter is possibly inconsistent. In particular, the key issue is to elicit a plausibility order on the formulas in an inconsistent knowledge base. We show how such ordering can be obtained by using only the inherent structure of the knowledge base. We start by introducing a principled way a reasonable ranking framework for formulas should satisfy. Then, a variety of ordering criteria have been used to define plausibility order over formulas. Finally, we study the behaviour of the different formula ranking semantics in terms of the proposed logical postulates as well as their (in)-compatibility.

List of keywords

Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning

4518

Group Fairness in Set Packing Problems

Sharmila Duppala, Juan Luque, John Dickerson, Aravind Srinivasan

[+] More

[-] Less

Kidney exchange programs (KEPs) typically seek to match incompatible patient-donors pairs together based on a utilitarian objective where the number of successful transplants is maximized—penalizing certain classes of highly-sensitized (i.e., difficult to match) patients. Prioritizing the welfare of highly-sensitized (hard-to-match) patients has been studied in several previous works [Farnadi et al. 2021 ,Dickerson et al. 2014] as a natural fairness criterion. We formulate the KEP problem as $k$-set packing with a group fairness notion of proportionality fairness—namely, fair $k$-set packing (\textsc{FairSP}). In this work we propose algorithms that take arbitrary proportionality vectors (i.e., policy-informed demands of how to prioritize different groups) and return an approximately fair solution with provable guarantees. Our main contributions are exact and approximate algorithms as well as hardness results for \textsc{FairSP} variants. Additionally, the tools we introduce serve to audit the price of fairness involved in prioritizing different groups in realistic KEPs and other $k$-set packing applications. We conclude with experiments on synthetic and realistic kidney exchange \textsc{FairSP} instances.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Game Theory and Economic Paradigms -> GTEP: Other
Search -> S: Combinatorial search and optimisation

4520

Discounting in Strategy Logic

Munyque Mittelmann, Aniello Murano, Laurent Perrussel

[+] More

[-] Less

In recent years, there has been growing interest in formal strategic reasoning about multi-agent systems. Traditional verification techniques allow one to check whether there is a winning strategy for a group of agents (and possibly synthesize it) but they do not take into account the fact that satisfying a goal sooner is different from satisfying it after a long wait. Discounting is a key paradigm in economics that captures the intuition that the far-away future is not as important as the near future. In this paper, we augment Strategy Logic with future discounting, denoted SL[D]. We consider “until” operators with discounting functions: the satisfaction value of a specification in SL[D] is a value in [0, 1], where the longer it takes to fulfill eventuality requirements, the smaller the satisfaction value is. We motivate our approach with examples from Game Theory and study the complexity of model-checking SL[D]-formulas.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning

4526

Convergence in Multi-Issue Iterative Voting under Uncertainty

Joshua Kavner, Reshef Meir, Francesca Rossi, Lirong Xia

[+] More

[-] Less

We study the effect of strategic behavior in iterative voting for multiple issues under uncertainty. We introduce a model synthesizing simultaneous multi-issue voting with Meir, Lev, and Rosenschein (2014)’s local dominance theory and determine its convergence properties. After demonstrating that local dominance improvement dynamics may fail to converge, we present two sufficient model refinements that guarantee convergence from any initial vote profile for binary issues: constraining agents to have O-legal preferences and endowing agents with less uncertainty about issues they are modifying than others. Our empirical studies demonstrate that although cycles are common when agents have no uncertainty, introducing uncertainty makes convergence almost guaranteed in practice.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

4527

Description Logics with Pointwise Circumscription

Federica Di Stefano, Magdalena Ortiz, Mantas Simkus

[+] More

[-] Less

Circumscription is one of the most expressive means to bring non-monotonic (common-sense) reasoning features to first-order logic and related formalisms. But unfortunately, the high expressiveness comes with high computational costs. In Description Logics (DLs), which in most cases are decidable fragments of first-order logic, circumscription often leads to undecidability or very high computational complexity of the basic reasoning problems. In this paper, we consider a new notion of circumscription for DLs, aiming to preserve the key ideas and advantages of classic circumscription while mitigating its impact on the computational complexity of reasoning. Specifically, we introduce pointwise circumscription for DLs, which is not only intuitive in terms of knowledge representation, but also provides a sound approximation of classic circumscription. Our main idea is to replace the second-order quantification step with a series of (pointwise) local checks on all domain elements and their immediate neighborhood. After defining this notion, we study its computational complexity for standard reasoning tasks (like checking concept satisfiability, concept subsumption, instance checking), under a couple of syntactic restrictions. Our main positive decidability and complexity results are for TBoxes of modal depth 1 (i.e. without nesting of existential or universal quantifiers) in the DLs ALCIO and ALCI, with (co)NEXPTIME-completeness and EXPTIME-completeness, respectively. This class of TBoxes is large and relevant in practice (e.g., the popular DLs of the DL-Lite family also disallow nesting of quantifiers). These upper bounds are obtained by a sophisticated reduction to integer programming. Finally, as an additional justification for the considered syntactic restriction, we provide a strong undecidability result for pointwise circumscription with general TBoxes (with modal depth greater than 1).

List of keywords

Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning

4557

Computing (1+epsilon)-Approximate Degeneracy in Sublinear Time

Quinton Yong, Valerie King, Alex Thomo

[+] More

[-] Less

The problem of finding the degeneracy of a graph is a subproblem of the k-core decomposition problem. In this paper, we present a (1 + epsilon)-approximate solution to the degeneracy problem which runs in O(n log n) time, sublinear in the input size for dense graphs, by sampling a small number of neighbors adjacent to high degree nodes. This improves upon the method by Bhattacharya et al., which implies a (4 + epsilon)-approximate ~O(n) solution to the degeneracy problem. Our algorithm can be extended to an O(n log n) time solution to the k-core decomposition problem. We also explore the use of our approximate algorithm as a technique for speeding up exact degeneracy computation. We prove theoretical guarantees of our algorithm and provide optimizations, which improve the running time of our algorithm in practice. Experiments on massive real-world web graphs show that our algorithm performs significantly faster than previous methods for computing degeneracy, including the 2022 exact degeneracy algorithm by Li et al.

List of keywords

Data Mining -> DM: Mining graphs
Multidisciplinary Topics and Applications -> MDA: Web and social networks

4572

Participatory Budgeting With Multiple Degrees of Projects And Ranged Approval Votes

Gogulapati Sreedurga

[+] More

[-] Less

In an indivisible participatory budgeting (PB) framework, we have a limited budget that is to be distributed among a set of projects, by aggregating the preferences of voters for the projects. All the prior work on indivisible PB assumes that each project has only one possible cost. In this work, we let each project have a set of permissible costs, each reflecting a possible degree of sophistication of the project. Each voter approves a range of costs for each project, by giving an upper and lower bound on the cost that she thinks the project deserves. The outcome of a PB rule selects a subset of projects and also specifies their corresponding costs. We study different utility notions and prove that the existing positive results when every project has exactly one permissible cost can also be extended to our framework where a project has several permissible costs. We also analyze the fixed parameter tractability of the problem. Finally, we propose some important and intuitive axioms and analyze their satisfiability by different PB rules. We conclude by making some crucial remarks.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

4580

Multi-objective Search via Lazy and Efficient Dominance Checks

Carlos Hernandez, Oren Salzman, Jorge Baier, Han Zhang, William Yeoh, Ariel Felner, Shao-Hung Chan, Sven Koenig

[+] More

[-] Less

Multi-objective search can be used to model many real-world problems that require finding Pareto optimal paths from a specified start state to a specified goal state, while considering different costmetrics such as distance, time, and fuel. The performance of multi-objective search can be improved by making dominance checking—an operation necessary to determine whether or not a path dominates another—more efficient. This was shown in practice by BOA*, a state-of-the-art bi-objective search algorithm, which outperforms previously existing bi-objective search algorithms in part because it adopts a lazy approach towards dominance checking. EMOA*, a recent multi-objective search algorithm, generalizes BOA* to more-than-two objectives using AVL trees for dominance checking. In this paper, we first propose Linear-Time Multi-Objective A* (LTMOA*), an multi-objective search algorithm that implements a more efficient dominance checking than EMOA* using simple data structures like arrays. We then propose an even lazier approach towards dominance checking, and the resulting algorithm, LazyLTMOA*, distinguishes from EMOA* and LTMOA* by removing the dominance checking during node generation. Our experimental results show that LazyLTMOA* outperforms EMOA* by up to an order of magnitude in terms of runtime.

List of keywords

Search -> S: Heuristic search
Robotics -> ROB: Motion and path planning
Search -> S: Combinatorial search and optimisation

4586

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari

[+] More

[-] Less

While neural text-to-speech (TTS) has achieved human-like natural synthetic speech, multilingual TTS systems are limited to resource-rich languages due to the need for paired text and studio-quality audio data. This paper proposes a method for zero-shot multilingual TTS using text-only data for the target language. The use of text-only data allows the development of TTS systems for low-resource languages for which only textual resources are available, making TTS accessible to thousands of languages. Inspired by the strong cross-lingual transferability of multilingual language models, our framework first performs masked language model pretraining with multilingual text-only data. Then we train this model with a paired data in a supervised manner, while freezing a language-aware embedding layer. This allows inference even for languages not included in the paired data but present in the text-only data. Evaluation results demonstrate highly intelligible zero-shot TTS with a character error rate of less than 12% for an unseen language. All experiments were conducted using public datasets and the implementation will be made available for reproducibility.

List of keywords

Natural Language Processing -> NLP: Speech
Natural Language Processing -> NLP: Language models

4601

A Mathematical Runtime Analysis of the Non-dominated Sorting Genetic Algorithm III (NSGA-III)

Simon Wietheger, Benjamin Doerr

[+] More

[-] Less

The Non-dominated Sorting Genetic Algorithm II (NSGA-II) is the most prominent multi-objective evolutionary algorithm for real-world applications. While it performs evidently well on bi-objective optimization problems, empirical studies suggest that it is less effective when applied to problems with more than two objectives. A recent mathematical runtime analysis confirmed this observation by proving the NGSA-II for an exponential number of iterations misses a constant factor of the Pareto front of the simple 3-objective OneMinMax problem. In this work, we provide the first mathematical runtime analysis of the NSGA-III, a refinement of the NSGA-II aimed at better handling more than two objectives. We prove that the NSGA-III with sufficiently many reference points – a small constant factor more than the size of the Pareto front, as suggested for this algorithm – computes the complete Pareto front of the 3-objective OneMinMax benchmark in an expected number of O(n log n) iterations. This result holds for all population sizes (that are at least the size of the Pareto front). It shows a drastic advantage of the NSGA-III over the NSGA-II on this benchmark. The mathematical arguments used here and in previous work on the NSGA-II suggest that similar findings are likely for other benchmarks with three or more objectives.

List of keywords

Search -> S: Evolutionary computation
Search -> S: Heuristic search

4607

Probabilistic Temporal Logic for Reasoning about Bounded Policies

Nima Motamed, Natasha Alechina, Mehdi Dastani, Dragan Doder, Brian Logan

[+] More

[-] Less

To build a theory of intention revision for agents operating in stochastic environments, we need a logic in which we can explicitly reason about their decision-making policies and those policies’ uncertain outcomes. Towards this end, we propose PLBP, a novel probabilistic temporal logic for Markov Decision Processes that allows us to reason about policies of bounded size. The logic is designed so that its expressive power is sufficient for the intended applications, whilst at the same time possessing strong computational properties. We prove that the satisfiability problem for our logic is decidable, and that its model checking problem is PSPACE-complete. This allows us to e.g. algorithmically verify whether an agents’ intentions are coherent, or whether a specific policy satisfies safety and/or liveness properties.

List of keywords

Knowledge Representation and Reasoning -> KRR: Reasoning about actions
Agent-based and Multi-agent Systems -> MAS: Agent theories and models

4629

REPLACE: A Logical Framework for Combining Collective Entity Resolution and Repairing

Meghyn Bienvenu, Gianluca Cima, Victor Gutierrez Basulto

[+] More

[-] Less

In this paper, we investigate the problem of querying dirty databases that may suffer both from the presence of erroneous facts and from multiple names being used to refer to the same entity. While each of these issues has been widely studied in isolation, our contribution is a holistic framework for jointly deduplicating and repairing data, thereby taking advantage of the interdependencies between these two operations. Our REPLACE framework follows a declarative approach, utilizing logical rules to specify under which conditions a pair of entity references can or must be merged and logical constraints to specify consistency requirements. The semantics of REPLACE gives rise to a space of possible solutions, each consisting of a set of merges to perform and a set of facts to delete, among which we single out three notions of optimal solutions, based upon maximizing merges and minimizing deletions. As there can be multiple optimal solutions, we consider the classical notions of possible and certain query answers, as well as novel notions of most informative possible and certain answers, which provide a more compact and useful representation of the answers. We perform a detailed analysis of the data complexity of the central reasoning tasks of recognizing optimal solutions and (most informative) possible and certain answers, for each of the three notions of optimal solution and for both general and restricted specifications.

List of keywords

Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Multidisciplinary Topics and Applications -> MDA: Databases

4636

Pseudo-Labeling Enhanced by Privileged Information and its application to In Situ Sequencing Images

Marzieh Haghighi, Mario Cruz, Erin Weisbart, Beth Cimini, Avtar Singh, Julia Bauman, Maria Lozada, Sanam Kavari, JT Neal, Paul Blainey, Anne Carpenter, Shantanu Singh

[+] More

[-] Less

Various strategies for label-scarce object detection have been explored by the computer vision research community. These strategies mainly rely on assumptions that are specific to natural images and not directly applicable to the biological and biomedical vision domains. In this work, we frame a crucial problem in spatial transcriptomics – decoding barcodes from In-Situ-Sequencing (ISS) images – as a semi-supervised object detection (SSOD) problem. Most SSOD strategies rely on a small set of labeled data as a confident source of ground truth in a semi-supervised learning setting. In many biological vision applications, however, the ground truth is unknown and indirect information might be available in the form of noisy estimations or orthogonal evidence. Our proposed framework incorporates additional available sources of information into a semi-supervised learning framework in the form of privileged information. The privileged information is incorporated into the teacher’s pseudo-labeling in a teacher-student self-training iteration. Although the available privileged information could be data domain specific, we have introduced a general strategy of pseudo-labeling enhanced by privileged information (PLePI) and exemplified the concept using ISS images, as well the COCO benchmark using extra evidence provided by CLIP.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Life sciences
Multidisciplinary Topics and Applications -> MDA: Bioinformatics

4650

NeuPSL: Neural Probabilistic Soft Logic

Connor Pryor, Charles Dickens, Eriq Augustine, Alon Albalak, William Yang Wang, Lise Getoor

[+] More

[-] Less

In this paper, we introduce Neural Probabilistic Soft Logic (NeuPSL), a novel neuro-symbolic (NeSy) framework that unites state-of-the-art symbolic reasoning with the low-level perception of deep neural networks. To model the boundary between neural and symbolic representations, we propose a family of energy-based models, NeSy Energy-Based Models, and show that they are general enough to include NeuPSL and many other NeSy approaches. Using this framework, we show how to seamlessly integrate neural and symbolic parameter learning and inference in NeuPSL. Through an extensive empirical evaluation, we demonstrate the benefits of using NeSy methods, achieving upwards of 30% improvement over independent neural network models. On a well-established NeSy task, MNIST-Addition, NeuPSL demonstrates its joint reasoning capabilities by outperforming existing NeSy approaches by up to 10% in low-data settings. Furthermore, NeuPSL achieves a 5% boost in performance over state-of-the-art NeSy methods in a canonical citation network task with up to a 200 times speed up.

List of keywords

Machine Learning -> ML: Symbolic methods
Machine Learning -> ML: Structured prediction

4651

Modeling with Homophily Driven Heterogeneous Data in Gossip Learning

Abhirup Ghosh, Cecilia Mascolo

[+] More

[-] Less

Training deep learning models on data distributed and local to edge devices such as mobile phones is a prominent recent research direction. In a Gossip Learning (GL) system, each participating device maintains a model trained on its local data and iteratively aggregates it with the models from its neighbours in a communication network. While the fully distributed operation in GL comes with natural advantages over the centralized orchestration in Federated Learning (FL), its convergence becomes particularly slow when the data distribution is heterogeneous and aligns with the clustered structure of the communication network. These characteristics are pervasive across practical applications as people with similar interests (thus producing similar data) tend to create communities. This paper proposes a data-driven softmax distribution based neighbor weighting strategy for aggregating the models: this enables faster diffusion of knowledge across the communities in the network and leads to quicker convergence. We augment the method to make it computationally efficient and fair: the devices quickly converge to the same model. We evaluate our model on real and synthetic datasets that we generate using a novel generative model for communication networks with heterogeneous data. Our exhaustive empirical evaluation verifies that our proposed method attains a faster convergence rate than the baselines. For example, the median test accuracy for a decentralized bird classifier application reaches 81% with our proposed method within $80$ rounds, whereas the baseline only reaches 46%.

List of keywords

Machine Learning -> ML: Federated learning

4653

Hierarchical Apprenticeship Learning For Disease Progression Modeling

Xi Yang, Ge Gao, Min Chi

[+] More

[-] Less

Disease progression modeling (DPM) plays an important role in characterizing historical progressive pathways and predicting potential future risks. Apprenticeship learning (AL) induces decision-making policies via observing and imitating experts’ demonstrations. In this work, we investigate incorporating the patterns extracted from AL for DPM. Specifically, we propose a Time-aware Hierarchical EM Energy-based Subsequence (THEMES) AL framework. To the best of our knowledge, this is the first work incorporating AL-derived intervention patterns for DPM and its effectiveness is evaluated on the task of early prediction for an extremely challenging condition: septic shock. Our results show that incorporating the THEMES-derived intervention patterns can further improve the performance of DPM.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data
Data Mining -> DM: Applications
Multidisciplinary Topics and Applications -> MDA: Health and medicine

4655

Temporal Network Creation Games

Davide Bilò, Sarel Cohen, Tobias Friedrich, Hans Gawendowicz, Nicolas Klodt, Pascal Lenzner, George Skretas

[+] More

[-] Less

Most networks are not static objects, but instead they change over time. This observation has sparked rigorous research on temporal graphs within the last years. In temporal graphs, we have a fixed set of nodes and the connections between them are only available at certain time steps. This gives rise to a plethora of algorithmic problems on such graphs, most prominently the problem of finding temporal spanners, i.e., the computation of subgraphs that guarantee all pairs reachability via temporal paths. To the best of our knowledge, only centralized approaches for the solution of this problem are known. However, many real-world networks are not shaped by a central designer but instead they emerge and evolve by the interaction of many strategic agents. This observation is the driving force of the recent intensive research on game-theoretic network formation models. In this work we bring together these two recent research directions: temporal graphs and game-theoretic network formation. As a first step into this new realm, we focus on a simplified setting where a complete temporal host graph is given and the agents, corresponding to its nodes, selfishly create incident edges to ensure that they can reach all other nodes via temporal paths in the created network. This yields temporal spanners as equilibria of our game. We prove results on the convergence to and the existence of equilibrium networks, on the complexity of finding best agent strategies, and on the quality of the equilibria. By taking these first important steps, we uncover challenging open problems that call for an in-depth exploration of the creation of temporal graphs by strategic agents.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games

4658

Ties in Multiwinner Approval Voting

Łukasz Janeczko, Piotr Faliszewski

[+] More

[-] Less

We study the complexity of deciding if there is a tie in a given approval-based multiwinner election, as well as the complexity of counting tied winning committees. We consider a family of Thiele rules, their greedy variants, Phragm{\’e}n’s sequential rule, and Method of Equal Shares. For most cases, our problems are computationally hard, but for sequential rules we find an FPT algorithm for discovering ties (parameterized by the committee size). We also show experimentally that in elections of moderate size ties are quite frequent.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

4663

Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning

Elizaveta Tennant, Steve Hailes, Mirco Musolesi

[+] More

[-] Less

Practical uses of Artificial Intelligence (AI) in the real world have demonstrated the importance of embedding moral choices into intelligent agents. They have also highlighted that defining top-down ethical constraints on AI according to any one type of morality is extremely challenging and can pose risks. A bottom-up learning approach may be more appropriate for studying and developing ethical behavior in AI agents. In particular, we believe that an interesting and insightful starting point is the analysis of emergent behavior of Reinforcement Learning (RL) agents that act according to a predefined set of moral rewards in social dilemmas. In this work, we present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories. We aim to design reward structures that are simplified yet representative of a set of key ethical systems. Therefore, we first define moral reward functions that distinguish between consequence- and norm-based agents, between morality based on societal norms or internal virtues, and between single- and mixed-virtue (e.g., multi-objective) methodologies. Then, we evaluate our approach by modeling repeated dyadic interactions between learning moral agents in three iterated social dilemma games (Prisoner’s Dilemma, Volunteer’s Dilemma and Stag Hunt). We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation, and the corresponding social outcomes. Finally, we discuss the implications of these findings for the development of moral agents in artificial and mixed human-AI societies.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
AI Ethics, Trust, Fairness -> ETF: Moral decision making

4665

Cross-community Adapter Learning (CAL) to Understand the Evolving Meanings of Norm Violation

Thiago Freitas dos Santos, Stephen Cranefield, Bastin Tony Roy Savarimuthu, Nardine Osman, Marco Schorlemmer

[+] More

[-] Less

Cross-community learning incorporates data from different sources to leverage task-specific solutions in a target community. This approach is particularly interesting for low-resource or newly created online communities, where data formalizing interactions between agents (community members) are limited. In such scenarios, a normative system that intends to regulate online interactions faces the challenge of continuously learning the meaning of norm violation as communities’ views evolve, either with changes in the understanding of what it means to violate a norm or with the emergence of new violation classes. To address this issue, we propose the Cross-community Adapter Learning (CAL) framework, which combines adapters and transformer-based models to learn the meaning of norm violations expressed as textual sentences. Additionally, we analyze the differences in the meaning of norm violations between communities, using Integrated Gradients (IG) to understand the inner workings of our model and calculate a global relevance score that indicates the relevance of words for violation detection. Results show that cross-community learning enhances CAL’s performance while explaining the differences in the meaning of norm-violating behavior based on community members’ feedback. We evaluate our proposal in a small set of interaction data from Wikipedia, in which the norm prohibits hate speech.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Normative systems
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Incremental learning

4671

Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics

Aamal Hussain, Francesco Belardinelli, Georgios Piliouras

[+] More

[-] Less

The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum assumption. Motivated by this, we study a smooth variant of Q-Learning, a popular reinforcement learning dynamics which balances the agents’ tendency to maximise their payoffs with their propensity to explore the state space. We examine this dynamic in games which are `close’ to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by the `distance’ to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest’ network zero-sum game can be found efficiently. Importantly, our theoretical guarantees are widely applicable in different game settings, regardless of whether the dynamics ultimately reach an equilibrium, or remain non convergent.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Machine Learning -> ML: Reinforcement learning

4672

Schelling Games with Continuous Types

Davide Bilò, Vittorio Bilò, Michelle Luise Döring, Pascal Lenzner, Louise Molitor, Jonas Schmidt

[+] More

[-] Less

In most major cities and urban areas, residents form homogeneous neighborhoods along ethnic or socioeconomic lines. This phenomenon is widely known as residential segregation and has been studied extensively. Fifty years ago, Schelling proposed a landmark model that explains residential segregation in an elegant agent-based way. There, agents of two types are placed on a grid. They are content with their location if they have at least a tau-fraction of same-type neighbors, for some tau in [0,1]. Discontent agents jump to empty cells or swap their location with other discontent agents. A recent stream of papers analyzed Schelling’s model using game-theoretic approaches. However, all these works considered models with a given number of discrete types modeling different ethnic groups. We focus on segregation caused by non-categorical attributes, such as household income or position in a political left-right spectrum. For this, we consider agent types that can be represented as real numbers in the interval [0,1]. This opens up a great variety of reasonable models and, as a proof of concept, we focus on several natural candidates. In particular, we consider agents that evaluate their location by the average type-difference or the maximum type-difference to their neighbors, or by having a certain tolerance range for type-values of neighboring agents. We study the existence and computation of equilibria and provide bounds on the Price of Anarchy and Stability. Also, we present simulation results that compare our models and shed light on the obtained equilibria for our variants.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence

4691

Ordinal Hedonic Seat Arrangement under Restricted Preference Domains: Swap Stability and Popularity

Anaëlle Wilczynski

[+] More

[-] Less

We study a variant of hedonic games, called hedonic seat arrangements [Bodlaender et al., 2020], where the goal is not to partition the agents into coalitions but to assign them to vertices of a given graph; their satisfaction is then based on the subset of agents in their neighborhood. We focus on ordinal hedonic seat arrangements where the preferences over neighborhoods are deduced from ordinal preferences over single agents and a given preference extension. In such games and for different types of preference restrictions and extensions, we investigate the existence of arrangements satisfying stability w.r.t. swaps of positions in the graph or the well known optimality concept of popularity.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Resource allocation

4714

BRExIt: On Opponent Modelling in Expert Iteration

Daniel Hernandez, Hendrik Baier, Michael Kaisers

[+] More

[-] Less

Finding a best response policy is a central objective in game theory and multi-agent learning, with modern population-based training approaches employing reinforcement learning algorithms as best-response oracles to improve play against candidate opponents (typically previously learnt policies). We propose Best Response Expert Iteration (BRExIt), which accelerates learning in games by incorporating opponent models into the state-of-the-art learning algorithm Expert Iteration (ExIt). BRExIt aims to (1) improve feature shaping in the apprentice, with a policy head predicting opponent policies as an auxiliary task, and (2) bias opponent moves in planning towards the given or learnt opponent model, to generate apprentice targets that better approximate a best response. In an empirical ablation on BRExIt’s algorithmic variants against a set of fixed test agents, we provide statistical evidence that BRExIt learns better performing policies than ExIt.

List of keywords

Machine Learning -> ML: Deep reinforcement learning
Machine Learning -> ML: Reinforcement learning
Search -> S: Game playing

4724

A Symbolic Approach to Computing Disjunctive Association Rules from Data

Said Jabbour, Badran Raddaoui, Lakhdar Sais

[+] More

[-] Less

Association rule mining is one of the well-studied and most important knowledge discovery task in data mining. In this paper, we first introduce the k-disjunctive support based itemset, a generalization of the traditional model of itemset by allowing the absence of up to k items in each transaction matching the itemset. Then, to discover more expressive rules from data, we define the concept of (k, k′)-disjunctive support based association rules by considering the antecedent and the consequent of the rule as k-disjunctive and k′-disjunctive support based itemsets, respectively. Second, we provide a polynomial-time reduction of both the problems of mining k-disjunctive support based itemsets and (k, k′)-disjunctive support based association rules to the propositional satisfiability model enumeration task. Finally, we show through an extensive campaign of experiments on several popular real-life datasets the efficiency of our proposed approach.

List of keywords

Data Mining -> DM: Frequent pattern mining
Constraint Satisfaction and Optimization -> CSO: Modeling
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty

4730

Spatially Covariant Lesion Segmentation

HANG Zhang, Rongguang Wang, Jinwei Zhang, Dongdong Liu, Chao Li, Jiahao Li

[+] More

[-] Less

Patterns in medical images are usually more structured than those in natural images and therefore this adds flexibility and elasticity to resource-limited clinical applications by injecting proper priors into neural networks. In this paper, we propose spatially covariant pixel-aligned classifier (SCP) to trade-off between computational efficiency and accuracy for lesion segmentation in human brain and liver. SCP relaxes the spatial invariance constraint imposed by convolutional operations and optimizes an underlying implicit function that maps image coordinates to network weights, the parameters of which are obtained along with the backbone network training and later used for generating network weights to capture spatially variant contextual information. We demonstrate the effectiveness and efficiency of the proposed SCP using two lesion segmentation tasks using different imaging sources: white matter hyperintensity segmentation in magnetic resonance imaging and liver tumor segmentation in contrast-enhanced abdominal computerized tomography. The network using SCP has achieved 23.8%, 64.9% and 74.7% reduction in GPU memory usage, FLOPs, and network size with similar or better accuracy for lesion segmentation.

List of keywords

Computer Vision -> CV: Segmentation
Computer Vision -> CV: Biomedical image analysis
Machine Learning -> ML: Convolutional networks

4732

On Discovering Interesting Combinatorial Integer Sequences

Martin Svatos, Peter Jung, Jan Tóth, Yuyi Wang, Ondrej Kuzelka

[+] More

[-] Less

We study the problem of generating {\em interesting} integer sequences with a combinatorial interpretation. For this we introduce a two-step approach. In the first step, we generate first-order logic sentences which define some combinatorial objects, e.g., undirected graphs, permutations, matchings etc. In the second step, we use algorithms for lifted first-order model counting to generate integer sequences that count the objects encoded by the first-order logic formulas generated in the first step. For instance, if the first-order sentence defines permutations then the generated integer sequence is the sequence of factorial numbers $n!$. We demonstrate that our approach is able to generate interesting new sequences by showing that a non-negligible fraction of the automatically generated sequences can actually be found in the Online Encyclopaedia of Integer Sequences (OEIS) while generating many other similar sequences which are not present in OEIS and which are potentially interesting. A key technical contribution of our work is the method for generation of first-order logic sentences which is able to drastically prune the space of sentences by discarding large fraction of sentences which would lead to redundant integer sequences.

List of keywords

Knowledge Representation and Reasoning -> KRR: Other

4734

Explanation-Guided Reward Alignment

Saaduddin Mahmud, Sandhya Saisubramanian, Shlomo Zilberstein

[+] More

[-] Less

When agents learn from observed trajectories, the acquired reward function is often misaligned because multiple functions may be consistent with the observations. Operating based on such proxy rewards may be unsafe. Further, black-box representations make it difficult to verify the learned rewards and prevent harmful behavior. Examining explanations generated from a learned reward function can help detect misalignment, as it gives insight into the reasoning behind reward estimations and reveals failure cases in novel scenarios. We present a framework for verifying and improving reward alignment using explanations. The problem is formulated as inverse reinforcement learning from ranked trajectories. Verification tests created from the trajectory dataset are used to iteratively validate and improve reward alignment. The agent explains its learned reward and a human signals whether the explanation passes the test. If it fails, the agent presents alternative explanations to acquire feedback that is used to improve the learned reward. We analyze the efficiency of our approach in improving reward alignment using different types of explanations and demonstrate its effectiveness in five domains.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Safety and robustness
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Reinforcement learning

4757

Context-Aware Feature Selection and Classification

Juanyan Wang, Mustafa Bilgic

[+] More

[-] Less

We propose a joint model that performs instance-level feature selection and classification. For a given case, the joint model first skims the full feature vector, decides which features are relevant for that case, and makes a classification decision using only the selected features, resulting in compact, interpretable, and case-specific classification decisions. Because the selected features depend on the case at hand, we refer to this approach as context-aware feature selection and classification. The model can be trained on instances that are annotated by experts with both class labels and instance-level feature selections, so it can select instance-level features that humans would use. Experiments on several datasets demonstrate that the proposed model outperforms eight baselines on a combined classification and feature selection measure, and is able to better emulate the ground-truth instance-level feature selections.

List of keywords

Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
Machine Learning -> ML: Classification
Machine Learning -> ML: Explainable/Interpretable machine learning

4760

Diversity, Agreement, and Polarization in Elections

Piotr Faliszewski, Andrzej Kaczmarczyk, Krzysztof Sornat, Stanisław Szufa, Tomasz Wąs

[+] More

[-] Less

We consider the notions of agreement, diversity, and polarization in ordinal elections (that is, in elections where voters rank the candidates). While (computational) social choice offers good measures of agreement between the voters, such measures for the other two notions are lacking. We attempt to rectify this issue by designing appropriate measures, providing means of their (approximate) computation, and arguing that they, indeed, capture diversity and polarization well. In particular, we present "maps of preference orders" that highlight relations between the votes in a given election and which help in making arguments about their nature.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

4765

Local and Global: Temporal Question Answering via Information Fusion

Yonghao Liu, Di Liang, Mengyu Li, Fausto Giunchiglia, Ximing Li, Sirui Wang, Wei Wu, Lan Huang, Xiaoyue Feng, Renchu Guan

[+] More

[-] Less

Many models that leverage knowledge graphs (KGs) have recently demonstrated remarkable success in question answering (QA) tasks. In the real world, many facts contained in KGs are time-constrained. Thus, temporal KGQA has received increasing attention. Despite the fruitful efforts of previous models in temporal KGQA, they still have several limitations. (I) They neither emphasize the graphs’ structural information between entities in KGs nor explicitly utilize a multi-hop relation path through graph neural networks to enhance answer prediction. (II) They adopt pre-trained language models (LMs) to obtain question representations, focusing merely on the global information related to the question while not highlighting the local information of the entities in KGs. To address these limitations, we introduce a novel model that simultaneously explores local information and Global information for the task of temporal KGQA (LGQA). Specifically, we first introduce an auxiliary task in the temporal KG embedding procedure to make timestamp embeddings time-order aware. Then, we design information fusion layers that effectively incorporate local and global information to augment question understanding. We conduct extensive experiments on two benchmarks, and LGQA significantly outperforms previous state-of-the-art models, especially in difficult questions. Moreover, LGQA can generate interpretable and trustworthy predictions.

List of keywords

Natural Language Processing -> NLP: Question answering

4766

Toward Convex Manifolds: A Geometric Perspective for Deep Graph Clustering of Single-cell RNA-seq Data

Nairouz Mrabah, Mohamed Mahmoud Amar, Mohamed Bouguessa, Abdoulaye Banire Diallo

[+] More

[-] Less

The deep clustering paradigm has shown great potential for discovering complex patterns that can reveal cell heterogeneity in single-cell RNA sequencing data. This paradigm involves two training phases: pretraining based on a pretext task and fine-tuning using pseudo-labels. Although current models yield promising results, they overlook the geometric distortions that regularly occur during the training process. More precisely, the transition between the two phases results in a coarse flattening of the latent structures, which can deteriorate the clustering performance. In this context, existing methods perform euclidean-based embedding clustering without ensuring the flatness and convexity of the latent manifolds. To address this problem, we incorporate two mechanisms. First, we introduce an overclustering loss to flatten the local curves. Second, we propose an adversarial mechanism to adjust the global geometric configuration. The second mechanism gradually transforms the latent structures into convex ones. Empirical results on a variety of gene expression datasets show that our model outperforms state-of-the-art methods.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Bioinformatics
Machine Learning -> ML: Clustering
Machine Learning -> ML: Unsupervised learning

4777

Vision Language Navigation with Knowledge-driven Environmental Dreamer

Fengda Zhu, Vincent CS Lee, Xiaojun Chang, Xiaodan Liang

[+] More

[-] Less

Vision-language navigation (VLN) requires an agent to perceive visual observation in a house scene and navigate step-by-step following natural language instruction. Due to the high cost of data annotation and data collection, current VLN datasets provide limited instruction-trajectory data samples. Learning vision-language alignment for VLN from limited data is challenging since visual observation and language instruction are both complex and diverse. Previous works only generate augmented data based on original scenes while failing to generate data samples from unseen scenes, which limits the generalization ability of the navigation agent. In this paper, we introduce the Knowledge-driven Environmental Dreamer (KED), a method that leverages the knowledge of the embodied environment and generates unseen scenes for a navigation agent to learn. Generating an unseen environment with texture consistency and structure consistency is challenging. To address this problem, we incorporate three knowledge-driven regularization objectives into the KED and adopt a reweighting mechanism for self-adaptive optimization. Our KED method is able to generate unseen embodied environments without extra annotations. We use KED to successfully generate 270 houses and 500K instruction-trajectory pairs. The navigation agent with the KED method outperforms the state-of-the-art methods on various VLN benchmarks, such as R2R, R4R, and RxR. Both qualitative and quantitative experiments prove that our proposed KED method is able to high-quality augmentation data with texture consistency and structure consistency.

List of keywords

Computer Vision -> CV: Vision and language
Robotics -> ROB: Robotics and vision

4783

Probabilistic Planning with Prioritized Preferences over Temporal Logic Objectives

Lening Li, Hazhar Rahmani, Jie Fu

[+] More

[-] Less

This paper studies temporal planning in probabilistic environments, modeled as labeled Markov decision processes (MDPs), with user preferences over multiple temporal goals. Existing works reflect such preferences as a prioritized list of goals. This paper introduces a new specification language, termed prioritized qualitative choice linear temporal logic on finite traces, which augments linear temporal logic on finite traces with prioritized conjunction and ordered disconjunction from prioritized qualitative choice logic. This language allows succinctly specifying temporal objectives with corresponding preferences accomplishing each temporal task. The finite traces that describe the system’s behaviors are ranked based on their dissatisfaction scores with respect to the formula. We propose a systematic translation from the new language to a weighted deterministic finite automaton. Utilizing this computational model, we formulate and solve a problem of computing an optimal policy that minimizes the expected score of dissatisfaction given user preferences. We demonstrate the efficacy and applicability of the logic and the algorithm on several case studies with detailed analyses for each.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Planning and Scheduling -> PS: Markov decisions processes

4787

Discriminative-Invariant Representation Learning for Unbiased Recommendation

Hang Pan, Jiawei Chen, Fuli Feng, Wentao Shi, junkang Wu, Xiangnan He

[+] More

[-] Less

Selection bias hinders recommendation models from learning unbiased user preference. Recent works empirically reveal that pursuing invariant user and item representation across biased and unbiased data is crucial for counteracting selection bias. However, our theoretical analysis reveals that simply optimizing representation invariance is insufficient for addressing the selection bias — recommendation performance is bounded by both representation invariance and discriminability. Worse still, current invariant representation learning methods in recommendation neglect even hurt the representation discriminability due to data sparsity and label shift. In this light, we propose a new Discriminative-Invariant Representation Learning framework for unbiased recommendation, which incorporates label-conditional clustering and prior-guided contrasting into conventional invariant representation learning to mitigate the impact of data sparsity and label shift, respectively. We conduct extensive experiments on three real-world datasets, validating the rationality and effectiveness of the proposed framework.

List of keywords

Data Mining -> DM: Recommender systems
AI Ethics, Trust, Fairness -> ETF: Bias

4799

Safety Verification and Universal Invariants for Relational Action Bases

Silvio Ghilardi, Alessandro Gianola, Marco Montali, Andrey Rivkin

[+] More

[-] Less

Modeling and verification of dynamic systems operating over a relational representation of states are increasingly investigated problems in AI, Business Process Management and Database Theory. To make these systems amenable to verification, the amount of information stored in each state needs to be bounded, or restrictions are imposed on the preconditions and effects of actions. We lift these restrictions by introducing the framework of Relational Action Bases (RABs), which generalizes existing frameworks and in which unbounded relational states are evolved through actions that can (1) quantify both existentially and universally over the data, and (2) use arithmetic constraints. We then study parameterized safety of RABs via (approximated) SMT-based backward search, singling out essential meta-properties of the resulting procedure, and showing how it can be realized by an off-the-shelf combination of existing verification modules of the state-of-the-art MCMT model checker. We demonstrate the effectiveness of this approach on a benchmark of data-aware business processes. Finally, we show how universal invariants can be exploited to make this procedure fully correct.

List of keywords

Knowledge Representation and Reasoning -> KRR: Causality
Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving

4804

Incentive-compatible Selection for One or Two Influentials

Yao Zhang, Yuxin Zhao, Dengji Zhao

[+] More

[-] Less

Selecting influentials in networks against strategic manipulations has attracted many researchers’ attention and it also has many practical applications. Here, we aim to select one or two influentials in terms of progeny (the influential power) and prevent agents from manipulating their edges (incentive compatibility). The existing studies mostly focused on selecting a single influential for this setting. Zhang {\it et al.} [2021] studied the problem of selecting one agent and proved an upper bound of $1/(1+\ln 2)$ to approximate the optimal selection. In this paper, we first design a mechanism to actually reach the bound. Then, we move this forward to choosing two agents and propose a mechanism to achieve an approximation ratio of $(3+\ln2)/(4(1+\ln2))$ ($\approx 0.54$).

List of keywords

Game Theory and Economic Paradigms -> GTEP: Mechanism design

4826

Case-based Reasoning with Language Models for Classification of Logical Fallacies

Zhivar Sourati Hassan Zadeh, Filip Ilievski, Hông Ân Sandlin, Alain Mermoud

[+] More

[-] Less

The ease and the speed of spreading misinformation and propaganda on the Web motivate the need to develop trustworthy technology for detecting fallacies in natural language arguments. However, state-of-the-art language modeling methods exhibit a lack of robustness on tasks like logical fallacy classification that require complex reasoning. In this paper, we propose a Case-Based Reasoning method that classifies new cases of logical fallacy by language-modeling-driven retrieval and adaptation of historical cases. We design four complementary strategies to enrich the input representation for our model, based on external information about goals, explanations, counterarguments, and argument structure. Our experiments in in-domain and out-of-domain settings indicate that Case-Based Reasoning improves the accuracy and generalizability of language models. Our ablation studies confirm that the representations of similar cases have a strong impact on the model performance, that models perform well with fewer retrieved cases, and that the size of the case database has a negligible effect on the performance. Finally, we dive deeper into the relationship between the properties of the retrieved cases and the model performance.

List of keywords

Natural Language Processing -> NLP: Information retrieval and text mining
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Knowledge Representation and Reasoning -> KRR: Case-based reasoning

4842

A New ANN-SNN Conversion Method with High Accuracy, Low Latency and Good Robustness

Bingsen Wang, Jian Cao, Jue Chen, Shuo Feng, Yuan Wang

[+] More

[-] Less

Due to the advantages of low energy consumption, high robustness and fast inference speed, Spiking Neural Networks (SNNs), with good biological interpretability and the potential to be applied on neuromorphic hardware, are regarded as the third generation of Artificial Neural Networks (ANNs). Despite having so many advantages, the biggest challenge encountered by spiking neural networks is training difficulty caused by the non-differentiability of spike signals. ANN-SNN conversion is an effective method that solves the training difficulty by converting parameters in ANNs to those in SNNs through a specific algorithm. However, the ANN-SNN conversion method also suffers from accuracy degradation and long inference time. In this paper, we reanalyzed the relationship between Integrate-and-Fire (IF) neuron model and ReLU activation function, proposed a StepReLU activation function more suitable for SNNs under membrane potential encoding, and used it to train ANNs. Then we converted the ANNs to SNNs with extremely small conversion error and introduced leakage mechanism to the SNNs and get the final models, which have high accuracy, low latency and good robustness, and have achieved the state-of-the-art performance on various datasets such as CIFAR and ImageNet.

List of keywords

Humans and AI -> HAI: Cognitive modeling
Machine Learning -> ML: Classification
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Robustness

4853

HOUDINI: Escaping from Moderately Constrained Saddles

Dmitrii Avdiukhin, Grigory Yaroslavtsev

[+] More

[-] Less

We give polynomial time algorithms for escaping from high-dimensional saddle points under a moderate number of constraints. Given gradient access to a smooth function $f \colon \mathbb R^d \to \mathbb R$ we show that (noisy) gradient descent methods can escape from saddle points under a logarithmic number of inequality constraints. This constitutes progress (without reliance on NP-oracles or altering the definitions to only account for certain constraints) on the main open question of the breakthrough work of [Ge et al.`15] who showed an analogous result for unconstrained and equality-constrained problems. Our results hold for both regular and stochastic gradient descent.

List of keywords

Machine Learning -> ML: Optimization

4870

TITAN : Task-oriented Dialogues with Mixed-Initiative Interactions

Sitong Yan, Shengli Song, Jingyang Li, Shiqi Meng, Guangneng Hu

[+] More

[-] Less

In a multi-domain task-oriented dialogue system, users often request under-or over-specified requests, sometimes with ambiguous and cross-domain demands in practice. However, most existing task-oriented dialogue datasets fail to consider mixed-initiative interaction and corresponding strategies to solve the above situations, which leads to low efficiency and poor collaboration ability in human-computer conversation. In this paper, we construct a multi-domain task-oriented dialogue dataset with mixed-initiative strategies TITAN from the large-scale dialogue corpus MultiWOZ 2.1. It contains a total of 1,800 human-human conversations where the agent can either asks clarification questions actively or provides relevant information to address failure situations and implicit user requests. We report the results of several baseline models on system response generation and dialogue act prediction to assess the performance of mixed-initiative strategies. These models can effectively learn mixed-initiative dialogue acts designed despite the deficiency to actively generate implicit requests, suggesting ample room for improvement in future studies.

List of keywords

Natural Language Processing -> NLP: Dialogue and interactive systems
Natural Language Processing -> NLP: Language generation
Natural Language Processing -> NLP: Resources and evaluation

4929

Denoised Self-Augmented Learning for Social Recommendation

Tianle Wang, Chao Huang, Lianghao Xia

[+] More

[-] Less

Social recommendation has been increasingly investigated in a broad spectrum of online applications (e.g., e-commerce, online streaming) to leverage social information for help user-item interaction modeling. Recently, Self-Supervised Learning (SSL) has been outstandingly successful in alleviating data sparsity with the augmented learning tasks. Inspired by this, recent attempts bring the benefits of SSL into social recommendation by supplementing the main supervised task with social-aware self-supervised signals. However, social information is unavoidably noisy for characterizing user preference, due to the ubiquitous presence of interest-irrelevant social connections, e.g., colleagues or classmates who do not share many common interests. To rectify this, we propose a new social recommender with a Denoised Cross-view Self-Augmented Learning paradigm (DSAL). It not only preserves the helpful social relations for enhancing user-item interaction modeling, but also allows the personalized cross-view knowledge transfer with adaptive semantic alignment in embedding space. Experimental results on various recommendation benchmarks verify the advantages of our DSAL over state-of-the-art methods.

List of keywords

Data Mining -> DM: Applications
Data Mining -> DM: Information retrieval

4930

Online Task Assignment with Controllable Processing Time

Ruoyu Wu, Wei Bao, Liming Ge

[+] More

[-] Less

We study a new online assignment problem, called the Online Task Assignment with Controllable Processing Time. In a bipartite graph, a set of online vertices (tasks) should be assigned to a set of offline vertices (machines) under the known adversarial distribution (KAD) assumption. We are the first to study controllable processing time in this scenario: There are multiple processing levels for each task and higher level brings larger utility but also larger processing delay. A machine can reject an assignment at the cost of a rejection penalty, taken from a pre-determined rejection budget. Different processing levels cause different penalties. We propose the Online Machine and Level Assignment (OMLA) Algorithm to simultaneously assign an offline machine and a processing level to each online task. We prove that OMLA achieves 1/2-competitive ratio if each machine has unlimited rejection budget and Δ/(3Δ-1)- competitive ratio if each machine has an initial rejection budget up to Δ. Interestingly, the competitive ratios do not change under different settings on the controllable processing time and we can conclude that OMLA is "insensitive" to the controllable processing time.

List of keywords

Planning and Scheduling -> PS: Scheduling
Planning and Scheduling -> PS: Planning under uncertainty

4936

Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks

Pei Xu, Junge Zhang, Kaiqi Huang

[+] More

[-] Less

Exploration under sparse rewards is a key challenge for multi-agent reinforcement learning problems. Previous works argue that complex dynamics between agents and the huge exploration space in MARL scenarios amplify the vulnerability of classical count-based exploration methods when combined with agents parameterized by neural networks, resulting in inefficient exploration. In this paper, we show that introducing constrained joint policy diversity into a classical count-based method can significantly improve exploration when agents are parameterized by neural networks. Specifically, we propose a joint policy diversity to measure the difference between current joint policy and previous joint policies, and then use a filtering-based exploration constraint to further refine the joint policy diversity. Under the sparse-reward setting, we show that the proposed method significantly outperforms the state-of-the-art methods in the multiple-particle environment, the Google Research Football, and StarCraft II micromanagement tasks. To the best of our knowledge, on the hard 3s_vs_5z task which needs non-trivial strategies to defeat enemies, our method is the first to learn winning strategies without domain knowledge under the sparse-reward setting.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Deep reinforcement learning

4961

Capturing the Long-Distance Dependency in the Control Flow Graph via Structural-Guided Attention for Bug Localization

Yi-Fan Ma, Yali Du, Ming Li

[+] More

[-] Less

To alleviate the burden of software maintenance, bug localization, which aims to automatically locate the buggy source files based on the bug report, has drawn significant attention in the software mining community. Recent studies indicate that the program structure in source code carries more semantics reflecting the program behavior, which is beneficial for bug localization. Benefiting from the rich structural information in the Control Flow Graph (CFG), CFG-based bug localization methods have achieved the state-of-the-art performance. Existing CFG-based methods extract the semantic feature from the CFG via the graph neural network. However, the step-wise feature propagation in the graph neural network suffers from the problem of information loss when the propagation distance is long, while the long-distance dependency is rather common in the CFG. In this paper, we argue that the long-distance dependency is crucial for feature extraction from the CFG, and propose a novel bug localization model named sgAttention. In sgAttention, a particularly designed structural-guided attention is employed to globally capture the information in the CFG, where features of irrelevant nodes are masked for each node to facilitate better feature extraction from the CFG. Experimental results on four widely-used open-source software projects indicate that sgAttention averagely improves the state-of-the-art bug localization methods by 32.9\% and 29.2\% and the state-of-the-art pre-trained models by 5.8\% and 4.9\% in terms of MAP and MRR, respectively.

List of keywords

Data Mining -> DM: Mining codebase and software repositories

4969

Towards an Integrated View of Semantic Annotation for POIs with Spatial and Textual Information

Dabin Zhang, Ronghui Xu, Weiming Huang, Kai Zhao, Meng Chen

[+] More

[-] Less

Categories of Point of Interest (POI) facilitate location-based services from many aspects like location search and POI recommendation. However, POI categories are often incomplete and new POIs are being consistently generated, this rises the demand for semantic annotation for POIs, i.e., labeling the POI with a semantic category. Previous methods usually model sequential check-in information of users to learn POI features for annotation. However, users’ check-ins are hardly obtained in reality, especially for those newly created POIs. In this context, we present a Spatial-Textual POI Annotation (STPA) model for static POIs, which derives POI categories using only the geographic locations and names of POIs. Specifically, we design a GCN-based spatial encoder to model spatial correlations among POIs to generate POI spatial embeddings, and an attention-based text encoder to model the semantic contexts of POIs to generate POI textual embeddings. We finally fuse the two embeddings and preserve multi-view correlations for semantic annotation. We conduct comprehensive experiments to validate the effectiveness of STPA with POI data from AMap. Experimental results demonstrate that STPA substantially outperforms several competitive baselines, which proves that STPA is a promising approach for annotating static POIs in map services.

List of keywords

Data Mining -> DM: Mining spatial and/or temporal data

4973

Exploiting Non-Interactive Exercises in Cognitive Diagnosis

FANGZHOU YAO, Zhenya Huang, Min Hou, Shiwei Tong, Qi Liu, Enhong Chen, Jing Sha, Shijin WANG

[+] More

[-] Less

Cognitive Diagnosis aims to quantify the proficiency level of students on specific knowledge concepts. Existing studies merely leverage observed historical students-exercise interaction logs to access proficiency levels. Despite effectiveness, observed interactions usually exhibit a power-law distribution, where the long tail consisting of students with few records. The phenomenon leads to the lack of supervision signals and result in inferior diagnosis among long-tail students. In this paper, we propose the Exercise-aware Informative Response Sampling (EIRS) framework to address the long-tail problem. EIRS is a general framework that explores the partial order in observed-unobserved responses as auxiliary ranking-based training signals to supplement cognitive diagnosis. Considering the abundance and complexity of unobserved responses, we first design an Exercise-aware Candidates Selection module, which helps our framework produce reliable potential responses similar to observed responses for effective supplementary training. Then we develop an Expected Ability Change-weighted Informative Sampling strategy to adaptively sample informative potential responses that contribute greatly to model training. Experiments on real-world datasets demonstrate the supremacy of our framework in long-tailed data.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Education
Data Mining -> DM: Applications

4991

A Hierarchical Approach to Population Training for Human-AI Collaboration

Yi Loo, Chen Gong, Malika Meghjani

[+] More

[-] Less

A major challenge for deep reinforcement learning (DRL) agents is to collaborate with novel partners that were not encountered by them during the training phase. This is specifically worsen with an increased variance in action responses when the DRL agents collaborate with human partners due to the lack of consistency in human behaviors. Recent work have shown that training a single agent as the best response to a diverse population of training partners significantly increase an agent’s robustness to novel partners. We further enhance the population-based training approach by introducing a Hierarchical Reinforcement Learning (HRL) based method for Human-AI Collaboration. Our agent is able to learn multiple best-response policies as its low-level policy while at the same time it learns a high-level policy that acts as a manager which allows the agent to dynamically switch between the low-level best-response policies based on its current partner. We demonstrate that our method is able to dynamically adapt to novel partners of different play styles and skill levels in the 2-player collaborative Overcooked game environment. We also conducted a human study on the same environment to test the effectiveness of our method when partnering with real human subjects.

List of keywords

Humans and AI -> HAI: Human-AI collaboration
Machine Learning -> ML: Deep reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Human-agent interaction

5011

GPMO: Gradient perturbation-based contrastive learning for molecule optimization

Xixi Yang, li fu, Yafeng Deng, Yuansheng Liu, Dongsheng Cao, Xiangxiang Zeng

[+] More

[-] Less

Optimizing molecules with desired properties is one of the essential steps in de novo drug design. Translation-based methods have achieved initial success but still struggle with the “exposure bias” problem. The challenge of preventing the “exposure bias” problem of molecule optimization lies in the need for both positive and negative samples of contrastive learning. That is because domain-specific knowledge is required if using data augmentation to generate positive samples for molecules, and randomly sampled negative samples are easily distinguished from the real samples. Hence, in this work, we propose a molecule optimization method called GPMO which employs a gradient perturbation-based contrastive learning method to prevent the “exposure bias” problem in translation-based molecule optimization. We conduct masked language model and SMILES translation tasks for pre-training and then optimize molecules with translation between starting and target molecules. The positive example is produced by large gradient perturbation, which maximizes its likelihood, whereas the negative example is obtained by small gradient perturbation, which minimizes its likelihood. With the help of these positive and negative examples, GPMO is better able to handle real and artificial samples. Our empirical studies show that GPMO outperforms the state-of-the-art molecule optimization methods. Furthermore, the negative and positive perturbations improve the robustness of GPMO.

List of keywords

Multidisciplinary Topics and Applications -> MDA: Bioinformatics

5012

Efficient Sign Language Translation with a Curriculum-based Non-autoregressive Decoder

Pei YU, liang zhang, Biao Fu, Yidong Chen

[+] More

[-] Less

Most existing studies on Sign Language Translation (SLT) employ AutoRegressive Decoding Mechanism (AR-DM) to generate target sentences. However, the main disadvantage of the AR-DM is high inference latency. To address this problem, we introduce Non-AutoRegressive Decoding Mechanism (NAR-DM) into SLT, which generates the whole sentence at once. Meanwhile, to improve its decoding ability, we integrate the advantages of curriculum learning and NAR-DM and propose a Curriculum-based NAR Decoder (CND). Specifically, the lower layers of the CND are expected to predict simple tokens that could be predicted correctly using source-side information solely. Meanwhile, the upper layers could predict complex tokens based on the lower layers’ predictions. Therefore, our CND significantly reduces the model’s inference latency while maintaining its competitive performance. Moreover, to further boost the performance of our CND, we propose a mutual learning framework, containing two decoders, i.e., an AR decoder and our CND. We jointly train the two decoders and minimize the KL divergence between their outputs, which enables our CND to learn the forward sequential knowledge from the strengthened AR decoder. Experimental results on PHOENIX2014T and CSL-Daily demonstrate that our model consistently outperforms all competitive baselines and achieves 7.92/8.02 times speed-up compared to the AR SLT model respectively.

List of keywords

Natural Language Processing -> NLP: Machine translation and multilinguality

5014

Adaptive Path-Memory Network for Temporal Knowledge Graph Reasoning

Hao Dong, Zhiyuan Ning, Pengyang Wang, Ziyue Qiao, Pengfei Wang, Yuanchun Zhou, Yanjie Fu

[+] More

[-] Less

Temporal knowledge graph (TKG) reasoning aims to predict the future missing facts based on historical information and has gained increasing research interest recently. Lots of works have been made to model the historical structural and temporal characteristics for the reasoning task. Most existing works model the graph structure mainly depending on entity representation. However, the magnitude of TKG entities in real-world scenarios is considerable, and an increasing number of new entities will arise as time goes on. Therefore, we propose a novel architecture modeling with relation feature of TKG, namely aDAptivE path-MemOry Network (DaeMon), which adaptively models the temporal path information between query subject and each object candidate across history time. It models the historical information without depending on entity representation. Specifically, DaeMon uses path memory to record the temporal path information derived from path aggregation unit across timeline considering the memory passing strategy between adjacent timestamps. Extensive experiments conducted on four real-world TKG datasets demonstrate that our proposed model obtains substantial performance improvement and outperforms the state-of-the-art up to 4.8% absolute in MRR.

List of keywords

Data Mining -> DM: Knowledge graphs and knowledge base completion
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Sequence and graph learning

5048

Reconstruction-Aware Prior Distillation for Semi-supervised Point Cloud Completion

Zhaoxin Fan, Yulin He, Zhicheng Wang, Kejian Wu, Hongyan Liu, Jun He

[+] More

[-] Less

Point clouds scanned by real-world sensors are always incomplete, irregular, and noisy, making the point cloud completion task become increasingly more important. Though many point cloud completion methods have been proposed, most of them require a large number of paired complete-incomplete point clouds for training, which is labor exhausted. In contrast, this paper proposes a novel Reconstruction-Aware Prior Distillation semi-supervised point cloud completion method named RaPD, which takes advantage of a two-stage training scheme to reduce the dependence on a large-scale paired dataset. In training stage 1, the so-called deep semantic prior is learned from both unpaired complete and unpaired incomplete point clouds using a reconstruction-aware pretraining process. While in training stage 2, we introduce a semi-supervised prior distillation process, where an encoder-decoder-based completion network is trained by distilling the prior into the network utilizing only a small number of paired training samples. A self-supervised completion module is further introduced, excavating the value of a large number of unpaired incomplete point clouds, leading to an increase in the network’s performance. Extensive experiments on several widely used datasets demonstrate that RaPD, the first semi-supervised point cloud completion method, achieves superior performance to previous methods on both homologous and heterologous scenarios.

List of keywords

Computer Vision -> CV: 3D computer vision

5051

Bidirectional Dilation Transformer for Multispectral and Hyperspectral Image Fusion

Shangqi Deng, Liang-Jian Deng, Xiao Wu, Ran Ran, Rui Wen

[+] More

[-] Less

Transformer-based methods can achieve long-distance modeling, correlate spatial and spectral information, and have a strong inductive bias in various computer vision tasks. For the transformer, there are two common modalities of multi-head self-attention (MSA), namely spatial MSA (Spa-MSA) and spectral MSA (Spe-MSA). Wherein, Spa-MSA limits global spatial response within a local window, even though it costs less computation. Besides, Spe-MSA could calculate channel self-attention to satisfy high-resolution images. however, it ignores the local information that is quite important to low-level vision tasks. In this paper, we develop a so-called bidirectional dilation Transformer (BDT) for multispectral and hyperspectral image fusion (MHIF), aiming to benefit the advantages of both MSA and the latent multiscale information of the specific MHIF application. Specifically, the BDT mainly consists of two designed modules, including: 1) the dilation Spa-MSA (D-Spa) that dynamically increases the spatial receptive field through the given hallow strategy, and 2) the grouped Spe-MSA (G-Spe) to extract latent features inside the feature map and learn local data behavior. Moreover, considering the specific characteristics of MHIF, \emph{i.e.,} two inputs with different spatial resolutions, a bidirectional hierarchy strategy is employed in the BDT to fully make use of the multiscale information of both inputs, thus achieving better performance. Extensive experiments on two commonly used datasets, \emph{i.e.,} CAVE and Harvard, verify the superiority of recent state-of-the-art methods, both visually and quantitatively. Additionally, we promise the release of code after possible acceptance.

List of keywords

Machine Learning -> ML: Attention models
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Multi-modal learning

5088

Revenue Maximization Mechanisms for an Uninformed Mediator with Communication Abilities

Zhikang Fan, Weiran Shen

[+] More

[-] Less

Consider a market where a seller owns an item for sale and a buyer wants to buy an item. Each player has a private type. It can be costly and difficult for them to come to an agreement on their own through communication. However, with a mediator as a trusted third party, the players can both communicate privately with the mediator and do not need to worry about leaking too much or too less information. The mediator can design and commit to a multi-round communication protocol for both players. After each round, rational players will update their beliefs about the other player’s type. The mediator cannot force players to trade, but may influence their behaviors by sending messages to them. We study the problem of designing revenue-maximizing mechanisms for the mediator. We show that the mediator can, without loss of generality, focus on the set of direct and incentive-compatible mechanisms. Then we formulate this problem as a mathematical program. Moreover, we give an optimal solution to the optimization problem in closed form under certain technical conditions. Our mechanism is also simple and has a threshold structure. Interestingly, we find that in the optimal mechanism, the mediator may even lose money in some cases.

List of keywords

5090

Orientation-Independent Chinese Text Recognition in Scene Images

Haiyang Yu, Xiaocong Wang, Bin Li, Xiangyang Xue

[+] More

[-] Less

Scene Text Recognition (STR) has attracted much attention due to its wide applications. Previous researches pay more attention to dealing with the recognition of Latin text images with complex backgrounds by introducing language models or other auxiliary networks. Different from Latin texts, many vertical Chinese texts exist in natural scenes, which brings difficulties for current SOTA STR methods. In this paper, we take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images, thus recognizing both horizontal and vertical text robustly in natural scenes. Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information. We conduct experiments in the Scene dataset of a Chinese text recognition benchmark, and the results demonstrate that the proposed method can indeed improve performance through disentangling content and orientation information. To further verify the effectiveness of our method, we additionally collect a Chinese Vertical Text Recognition (CVTR) dataset. The experimental results show that our method achieves 45.63% improvement in CVTR when introducing CIRN to the baseline model.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Vision and language

5092

FedHGN: A Federated Framework for Heterogeneous Graph Neural Networks

Xinyu Fu, Irwin King

[+] More

[-] Less

Heterogeneous graph neural networks (HGNNs) have shown advantages in learning typed and relational graph data. With a much larger parameter space than conventional GNNs, HGNNs are more likely to suffer from insufficient training data. While most real-world graphs are heterogeneous, many applications prohibit centralized data collection due to strict regulations (e.g., GDPR). Federated graph learning (FGL) enables multiple participating clients to collaboratively train a GNN without sharing local data. However, existing FGL methods focus on homogeneous GNNs or knowledge graph embeddings; few have considered heterogeneous graphs and HGNNs. In federated heterogeneous graph learning, clients may hold graphs of different or even private schemas, where conventional FL/FGL methods requiring the same model across different clients are inapplicable. Therefore, we design FedHGN, a novel and general FGL framework for HGNNs. Specifically, FedHGN applies schema-weight decoupling to enable schema-agnostic knowledge sharing and employs coefficients alignment to stabilize the federated training process and improve HGNN performance. With better privacy preservation, FedHGN consistently outperforms local training and conventional FL methods on three widely adopted heterogeneous graph datasets with varying client numbers.

List of keywords

Machine Learning -> ML: Federated learning
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Sequence and graph learning

5096

Dual Relation Knowledge Distillation for Object Detection

Zhen-Liang Ni, yang fukui, shengzhao wen, gang zhang

[+] More

[-] Less

Knowledge distillation is an effective method for model compression. However, it is still a challenging topic to apply knowledge distillation to detection tasks. There are two key points resulting poor distillation performance for detection tasks. One is the serious imbalance between foreground and background features, another one is that small object lacks enough feature representation. To solve the above issues, we propose a new distillation method named dual relation knowledge distillation (DRKD), including pixel-wise relation distillation and instance-wise relation distillation. The pixel-wise relation distillation embeds pixel-wise features in the graph space and applies graph convolution to capture the global pixel relation. By distilling the global pixel relation, the student detector can learn the relation between foreground and background features, avoid the difficulty of distilling feature directly for feature imbalance issue. Besides, we find that instance-wise relation supplements valuable knowledge beyond independent features for small objects. Thus, the instance-wise relation distillation is designed, which calculates the similarity of different instances to obtain a relation matrix. More importantly, a relation filter module is designed to highlight valuable instance relations. The proposed dual relation knowledge distillation is general and can be easily applied for both one-stage and two-stage detectors. Our method achieves state-of-the-art performance, which improves Faster R-CNN based on ResNet50 from 38.4% to 41.6% mAP and improves RetinaNet based on ResNet50 from 37.4% to 40.3% mAP on COCO 2017.

List of keywords

Computer Vision -> CV: Recognition (object detection, categorization)

5098

Formal Explanations of Neural Network Policies for Planning

Renee Selvey, Alban Grastien, Sylvie Thiebaux

[+] More

[-] Less

Deep learning is increasingly used to learn policies for planning problems. However, policies represented by neural networks are difficult to interpret, verify and trust. Existing formal approaches to post-hoc explanations provide concise reasons for a single decision made by an ML model. However, understanding planning policies require explaining sequences of decisions. In this paper, we formulate the problem of finding explanations for the sequence of decisions recommended by a learnt policy in a given state. We show that, under certain assumptions, a minimal explanation for a sequence can be computed by solving a number of single decision explanation problems which is linear in the length of the sequence. We present experimental results of our implementation of this approach for ASNets policies for classical planning domains.

List of keywords

Planning and Scheduling -> PS: Model-based reasoning
Machine Learning -> ML: Explainable/Interpretable machine learning
Planning and Scheduling -> PS: Learning in planning and scheduling

5102

KEST: Kernel Distance Based Efficient Self-Training for Improving Controllable Text Generation

Yuxi Feng, Xiaoyuan Yi, Laks Lakshmanan, Xing Xie

[+] More

[-] Less

Self-training (ST) has come to fruition in language understanding tasks by producing pseudo labels, which reduces the labeling bottleneck of language model fine-tuning. Nevertheless, in facilitating semi-supervised controllable language generation, ST faces two key challenges. First, augmented by self-generated pseudo text, generation models tend to over-exploit the previously learned text distribution, suffering from mode collapse and poor generation diversity. Second, generating pseudo text in each iteration is time-consuming, severely decelerating the training process. In this work, we propose KEST, a novel and efficient self-training framework to handle these problems. KEST utilizes a kernel-based loss, rather than standard cross entropy, to learn from the soft pseudo text produced by a shared non-autoregressive generator. We demonstrate both theoretically and empirically that KEST can benefit from more diverse pseudo text in an efficient manner, which allows not only refining and exploiting the previously fitted distribution but also enhanced exploration towards a larger potential text space, providing a guarantee of improved performance. Experiments on three controllable generation tasks demonstrate that KEST significantly improves control accuracy while maintaining comparable text fluency and generation diversity against several strong baselines.

List of keywords

Natural Language Processing -> NLP: Language generation

5107

Learning Prototype Classifiers for Long-Tailed Recongition

Saurabh Sharma, Yongqin Xian, Ning Yu, Ambuj Singh

[+] More

[-] Less

The problem of long-tailed recognition (LTR) has received attention in recent years due to the fundamental power-law distribution of objects in the real-world. Most recent works in LTR use softmax classifiers that have a tendency to correlate classifier norm with the amount of training data for a given class. On the other hand, Prototype classifiers do not suffer from this shortcoming and can deliver promising results simply using Nearest-Class-Mean (NCM), a special case where prototypes are empirical centroids. However, the potential of Prototype classifiers as an alternative to softmax in LTR is relatively underexplored. In this work, we propose Prototype classifiers, which jointly \emph{learn prototypes} that minimize average cross-entropy loss based on probability scores from distances to prototypes. We theoretically analyze the properties of Euclidean distance based prototype classifiers that leads to stable gradient-based optimization which is robust to outliers. We further enhance Prototype classifiers by learning channel-dependent temperature parameters to enable independent distance scales along each channel. Our analysis shows that prototypes learned by Prototype classifiers are better separated than empirical centroids. Results on four long-tailed recognition benchmarks show that Prototype classifier outperforms or is comparable to the state-of-the-art methods.

List of keywords

Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Few-shot learning

5126

Learning Attention from Attention: Efficient Self-refinement Transformer for Face Super-resolution

Guanxin Li, Jingang Shi, Yuan Zong, Fei Wang, Tian Wang, Yihong Gong

[+] More

[-] Less

Recently, Transformer-based architecture has been introduced into face super-resolution task due to its advantage in capturing long-range dependencies. However, these approaches tend to integrate global information in a large searching region, which neglect to focus on the most relevant information and induce blurry effect by the irrelevant textures. Some improved methods simply constrain self-attention in a local window to suppress the useless information. But it also limits the capability of recovering high-frequency details when flat areas dominate the local searching window. To improve the above issues, we propose a novel self-refinement mechanism which could adaptively achieve texture-aware reconstruction in a coarse-to-fine procedure. Generally, the primary self-attention is first conducted to reconstruct the coarse-grained textures and detect the fine-grained regions required further compensation. Then, region selection attention is performed to refine the textures on these key regions. Since self-attention considers the channel information on tokens equally, we employ a dual-branch feature integration module to privilege the important channels in feature extraction. Furthermore, we design the wavelet fusion module which integrate shallow-layer structure and deep-layer detailed feature to recover realistic face images in frequency domain. Extensive experiments demonstrate the effectiveness on a variety of datasets. The code will be released.

List of keywords

Computer Vision -> CV: Biometrics, face, gesture and pose recognition
Computer Vision -> CV: Computational photography

5145

FedET: A Communication-Efficient Federated Class-Incremental Learning Framework Based on Enhanced Transformer

Chenghao Liu, Xiaoyang Qu, Jianzong Wang, Jing Xiao

[+] More

[-] Less

Federated Learning (FL) has been widely concerned for it enables decentralized learning while ensuring data privacy. However, most existing methods unrealistically assume that the classes encountered by local clients are fixed over time. After learning new classes, this impractical assumption will make the model’s catastrophic forgetting of old classes significantly severe. Moreover, due to the limitation of communication cost, it is challenging to use large-scale models in FL, which will affect the prediction accuracy. To address these challenges, we propose a novel framework, Federated Enhanced Transformer (FedET), which simultaneously achieves high accuracy and low communication cost. Specifically, FedET uses Enhancer, a tiny module, to absorb and communicate new knowledge, and applies pre-trained Transformers combined with different Enhancers to ensure high precision on various tasks. To address local forgetting caused by new classes of new tasks and global forgetting brought by non-i.i.d class imbalance across different local clients, we proposed an Enhancer distillation method to modify the imbalance between old and new knowledge and repair the non-i.i.d. problem. Experimental results demonstrate that FedET’s average accuracy on a representative benchmark dataset is 14.1% higher than the state-of-the-art method, while FedET saves 90% of the communication cost compared to the previous method.

List of keywords

Machine Learning -> ML: Federated learning
Machine Learning -> ML: Incremental learning

5148

Intent-aware Recommendation via Disentangled Graph Contrastive Learning

Yuling Wang, Xiao Wang, Xiangzhou Huang, yanhua yu, Haoyang Li, Mengdi Zhang, Zirui Guo, Wei Wu

[+] More

[-] Less

Graph neural network (GNN) based recommender systems have become one of the mainstream trends due to the powerful learning ability from user behavior data. Understanding the user intents from behavior data is the key to recommender systems, which poses two basic requirements for GNN-based recommender systems. One is how to learn complex and diverse intents especially when the user behavior is usually inadequate in reality. The other is different behaviors have different intent distributions, so how to establish their relations for a more explainable recommender system. In this paper, we present the Intent-aware Recommendation via Disentangled Graph Contrastive Learning (IDCL), which simultaneously learns interpretable intents and behavior distributions over those intents. Specifically, we first model the user behavior data as a user-item-concept graph, and design a GNN based behavior disentangling module to learn the different intents. Then we propose the intent-wise contrastive learning to enhance the intent disentangling and meanwhile infer the behavior distributions. Finally, the coding rate reduction regularization is introduced to make the behaviors of different intents orthogonal. Extensive experiments demonstrate the effectiveness of IDCL in terms of substantial improvement and the interpretability.

List of keywords

Data Mining -> DM: Mining graphs
Data Mining -> DM: Networks
Data Mining -> DM: Recommender systems

5155

Prompt Federated Learning for Weather Forecasting: Toward Foundation Models on Meteorological Data

Shengchao Chen, Guodong Long, Tao Shen, Jing Jiang

[+] More

[-] Less

To tackle the global climate challenge, it urgently needs to develop a collaborative platform for comprehensive weather forecasting on large-scale meteorological data. Despite urgency, heterogeneous meteorological sensors across countries and regions, inevitably causing multivariate heterogeneity and data exposure, become the main barrier. This paper develops a foundation model across regions capable of understanding complex meteorological data and providing weather forecasting. To relieve the data exposure concern across regions, a novel federated learning approach has been proposed to collaboratively learn a brand-new spatio-temporal Transformer-based foundation model across participants with heterogeneous meteorological data. Moreover, a novel prompt learning mechanism has been adopted to satisfy low-resourced sensors’ communication and computational constraints. The effectiveness of the proposed method has been demonstrated on classical weather forecasting tasks using three meteorological datasets with multivariate time series.

List of keywords

Machine Learning -> ML: Time series and data streams
Machine Learning -> ML: Federated learning

5164

Learning to Binarize Continuous Features for Neuro-Rule Networks

Wei Zhang, Yongxiang Liu, Zhuo Wang, Jianyong Wang

[+] More

[-] Less

Neuro-Rule Networks (NRNs) emerge as a promising neuro-symbolic method, enjoyed by the ability to equate fully-connected neural networks with logic rules. To support learning logic rules consisting of boolean variables, converting input features into binary representations is required. Different from discrete features that could be directly transformed by one-hot encodings, continuous features need to be binarized based on some numerical intervals. Existing studies usually select the bound values of intervals based on empirical strategies (e.g., equal-width interval). However, it is not optimal since the bounds are fixed and cannot be optimized to accommodate the ultimate training target. In this paper, we propose AutoInt, an approach that automatically binarizes continuous features and enables the intervals to be optimized with NRNs in an end-to-end fashion. Specifically, AutoInt automatically selects an interval for a given continuous feature in a soft manner to enable a differentiable learning procedure of interval-related parameters. Moreover, it introduces an additional soft K-means clustering loss to make the interval centres approach the original feature value distribution, thus reducing the risk of overfitting intervals. We conduct comprehensive experiments on public datasets and demonstrate the effectiveness of AutoInt in boosting the performance of NRNs.

List of keywords

Machine Learning -> ML: Neuro-symbolic methods

5168

A Fast Maximum k-Plex Algorithm Parameterized by the Degeneracy Gap

Zhengren Wang, Yi Zhou, Chunyu Luo, Mingyu Xiao

[+] More

[-] Less

Given a graph, the k-plex is a vertex set in which each vertex is not adjacent to at most k-1 other vertices in the set. The maximum k-plex problem, which asks for the largest k-plex from a given graph, is an important but computationally challenging problem in applications like graph search and community detection. So far, there is a number of empirical algorithms without sufficient theoretical explanations on the efficiency. We try to bridge this gap by defining a novel parameter of the input instance, g_k(G), the gap between the degeneracy bound and the size of maximum k-plex in the given graph, and presenting an exact algorithm parameterized by g_k(G). In other words, we design an algorithm with running time polynomial in the size of input graph and exponential in g_k(G) where k is a constant. Usually, g_k(G) is small and bounded by O(log(|V|)) in real-world graphs, indicating that the algorithm runs in polynomial time. We also carry out massive experiments and show that the algorithm is competitive with the state-of-the-art solvers. Additionally, for large k values such as 15 and 20, our algorithm has superior performance over existing algorithms.

List of keywords

Search -> S: Heuristic search
Search -> S: Combinatorial search and optimisation

5171

MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning

Haolin Song, Mingxiao Feng, Wengang Zhou, Houqiang Li

[+] More

[-] Less

Recent approaches have utilized self-supervised auxiliary tasks as representation learning to improve the performance and sample efficiency of vision-based reinforcement learning algorithms in single-agent settings. However, in multi-agent reinforcement learning (MARL), these techniques face challenges because each agent only receives partial observation from an environment influenced by others, resulting in correlated observations in the agent dimension. So it is necessary to consider agent-level information in representation learning for MARL. In this paper, we propose an effective framework called Multi-Agent Masked Attentive Contrastive Learning (MA2CL), which encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space. Specifically, we use an attention reconstruction model for recovering and the model is trained via contrastive learning. MA2CL allows better utilization of contextual information at the agent level, facilitating the training of MARL agents for cooperation tasks. Extensive experiments demonstrate that our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.

List of keywords

Machine Learning -> ML: Deep reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Machine Learning -> ML: Representation learning

5176

Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video

Zenan Xu, Xiaojun Meng, Yasheng Wang, Qinliang Su, Zexuan Qiu, Xin Jiang, Qun Liu

[+] More

[-] Less

Multimodal abstractive summarization for videos (MAS) requires generating a concise textual summary to describe the highlights of a video according to multimodal resources, in our case, the video content and its transcript. Inspired by the success of the large-scale generative pre-trained language model (GPLM) in generating high-quality textual content (e.g., summary), recent MAS methods have proposed to adapt the GPLM to this task by equipping it with the visual information, which is often obtained through a general-purpose visual feature extractor. However, the generally extracted visual features may overlook some summary-worthy visual information, which impedes model performance. In this work, we propose a novel approach to learning the summary-worthy visual representation that facilitates abstractive summarization. Our method exploits the summary-worthy information from both the cross-modal transcript data and the knowledge that distills from the pseudo summary. Extensive experiments on three public multimodal datasets show that our method outperforms all competing baselines. Furthermore, with the advantages of summary-worthy visual information, our model can have a significant improvement on small datasets or even datasets with limited training data.

List of keywords

Natural Language Processing -> NLP: Summarization
Machine Learning -> ML: Multi-modal learning

5195

More for Less: Safe Policy Improvement with Stronger Performance Guarantees

Patrick Wienhöft, Marnix Suilen, Thiago Simão, Clemens Dubslaff, Christel Baier, Nils Jansen

[+] More

[-] Less

In an offline reinforcement learning (RL) setting, the safe policy improvement (SPI) problem aims to improve the performance of a behavior policy according to which sample data has been generated. State-of-the-art approaches to SPI require a high number of samples to provide practical probabilistic guarantees on the improved policy’s performance. We present a novel approach to the SPI problem that requires less data for such guarantees. Specifically, we devise efficient transformations on the data set and the underlying environment model as preprocessing steps of the well-established SPI with baseline bootstrapping (SPIBB) method. Our implementation and experimental evaluation on standard benchmarks shows that the sample complexity for SPI can indeed be significantly reduced compared to standard SPIBB.

List of keywords

Machine Learning -> ML: Reinforcement learning
Planning and Scheduling -> PS: Learning in planning and scheduling
Planning and Scheduling -> PS: Planning under uncertainty

5220

Stochastic Population Update Can Provably Be Helpful in Multi-Objective Evolutionary Algorithms

Chao Bian, Yawen Zhou, Miqing Li, Chao Qian

[+] More

[-] Less

Evolutionary algorithms (EAs) have been widely and successfully applied to solve multi-objective optimization problems, due to their nature of population-based search. Population update is a key component in multi-objective EAs (MOEAs), and it is performed in a greedy, deterministic manner. That is, the next-generation population is formed by selecting the first population-size ranked solutions (based on some selection criteria, e.g., non-dominated sorting, crowdedness and indicators) from the collections of the current population and newly-generated solutions. In this paper, we question this practice. We analytically present that introducing randomness into the population update procedure in MOEAs can be beneficial for the search. More specifically, we prove that the expected running time of a well-established MOEA (SMS-EMOA) for solving a commonly studied bi-objective problem, OneJumpZeroJump, can be exponentially decreased if replacing its deterministic population update mechanism by a stochastic one. Empirical studies also verify the effectiveness of the proposed stochastic population update method. This paper is an attempt to challenge a common practice for the population update in MOEAs. Its positive results, which might hold more generally, should encourage the exploration of developing new MOEAs in the area.

List of keywords

Search -> S: Evolutionary computation

5225

Doubly Stochastic Graph-based Non-autoregressive Reaction Prediction

Ziqiao Meng, Peilin Zhao, Yang Yu, Irwin King

[+] More

[-] Less

Organic reaction prediction is an important task in drug discovery. Recently, non-autoregressive reaction prediction has been achieved through modeling redistribution of electrons, reaching state-of-the-art top-1 accuracy and enabling parallel sampling. However, the current non-autoregressive decoder does not simultaneously fulfill two important rules of electron distribution modeling, the electron-counting rule and the symmetry rule, which violates the physical constraints of chemical reactions and thereby impairs the model performance. In this work, we propose a novel framework ReactionSink by combining two doubly stochastic self-attention mappings to obtain electron redistribution predictions that follow the above two constraints and further extend our solution to general multi-head attention mechanism with augmented constraints. To achieve this, we apply Sinkhorn’s algorithm to iteratively update self-attention mappings, which imposes doubly conservative constraint as an additional information prior on electron redistribution modeling. We theoretically show that our ReactionSink can satisfy both rules at the same time while the current decoder mechanism has to violate either of them. Empirical results demonstrate that our approach consistently improves the predictive performance of non-autoregressive models and does not bring unbearable additional computational cost.

List of keywords

Machine Learning -> ML: Structured prediction
Machine Learning -> ML: Attention models
Multidisciplinary Topics and Applications -> MDA: Physical sciences

5253

Runtime Analyses of Multi-Objective Evolutionary Algorithms in the Presence of Noise

Matthieu Dinot, Benjamin Doerr, Ulysse Hennebelle, Sebastian Will

[+] More

[-] Less

In single-objective optimization, it is well known that evolutionary algorithms also without further adjustments can stand a certain amount of noise in the evaluation of the objective function. In contrast, this question is not at all understood for multi-objective optimization. In this work, we conduct the first mathematical runtime analysis of a simple multi-objective evolutionary algorithm (MOEA) on a classic benchmark in the presence of noise in the objective functions. We prove that when bit-wise prior noise with rate $p \le \alpha/n$, $\alpha$ a suitable constant, is present, the simple evolutionary multi-objective optimizer (SEMO) without any adjustments to cope with noise finds the Pareto front of the OneMinMax benchmark in time $O(n^2\log n)$, just as in the case without noise. Given that the problem here is to arrive at a population consisting of $n+1$ individuals witnessing the Pareto front, this is a surprisingly strong robustness to noise (note that comparably simple evolutionary algorithms cannot optimize the single-objective OneMax problem in polynomial time when $p = \omega(\log(n)/n)$). Our proofs suggest that the strong robustness of the MOEA stems from its implicit diversity mechanism, which allows it to compute a population covering the whole Pareto front. Interestingly, and different from what has been observed in single-objective optimization, this result only holds when the objective value of a solution is determined only once and the algorithm from that point on works with this, possibly noisy, objective value. We prove that when all solutions are reevaluated in each iteration, then any noise rate $p = \omega(\log(n)/n^2)$ leads to a super-polynomial runtime.

List of keywords

Search -> S: Heuristic search

5260

Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition

Yuchen Hu, Ruizhe Li, Chen Chen, Heqing Zou, Qiu-Shi Zhu, Eng Siong Chng

[+] More

[-] Less

Audio-visual speech recognition (AVSR) research has gained a great success recently by improving the noise-robustness of audio-only automatic speech recognition (ASR) with noise-invariant visual information. However, most existing AVSR approaches simply fuse the audio and visual features by concatenation, without explicit interactions to capture the deep correlations between them, which results in sub-optimal multimodal representations for downstream speech recognition task. In this paper, we propose a cross-modal global interaction and local alignment (GILA) approach for AVSR, which captures the deep audio-visual (A-V) correlations from both global and local perspectives. Specifically, we design a global interaction model to capture the A-V complementary relationship on modality level, as well as a local alignment approach to model the A-V temporal consistency on frame level. Such a holistic view of cross-modal correlations enable better multimodal representations for AVSR. Experiments on public benchmarks LRS3 and LRS2 show that our GILA outperforms the supervised learning state-of-the-art.

List of keywords

Natural Language Processing -> NLP: Speech

5274

Parameterized Local Search for Max c-Cut

Jaroslav Garvardt, Niels Grüttemeier, Christian Komusiewicz, Nils Morawietz

[+] More

[-] Less

In the NP-hard Max $c$-Cut problem, one is given an undirected edge-weighted graph $G$ and wants to color the vertices of $G$ with $c$ colors such that the total weight of edges with distinctly colored endpoints is maximal. The case with $c=2$ is the famous Max Cut problem. To deal with the NP-hardness of this problem, we study parameterized local search algorithms. More precisely, we study LS-Max $c$-Cut where we are additionally given a vertex coloring $f$ and an integer $k$ and the task is to find a better coloring $f’$ that differs from $f$ in at most $k$ entries, if such a coloring exists; otherwise, $f$ is $k$-optimal. We show that LS-Max $c$-Cut presumably cannot be solved in $g(k)\cdot n^{O(1)}$ time even on bipartite graphs, for all $c\ge 2$. We then show an algorithm for LS-Max $c$-Cut with running time $O((3e\Delta)^k\cdot c\cdot k^3\cdot\Delta\cdot n)$, where $\Delta$ is the maximum degree of the input graph. Finally, we evaluate the practical performance of this algorithm in a hill-climbing approach as a post-processing for state-of-the-art heuristics for Max $c$-Cut. We show that using parameterized local search, the results of this heuristic can be further improved on a set of standard benchmark instances.

List of keywords

Search -> S: Local search

5281

Diagnose Like a Pathologist: Transformer-Enabled Hierarchical Attention-Guided Multiple Instance Learning for Whole Slide Image Classification

Conghao Xiong, Hao Chen, Joseph J.Y. Sung, Irwin King

[+] More

[-] Less

Multiple Instance Learning (MIL) and transformers are increasingly popular in histopathology Whole Slide Image (WSI) classification. However, unlike human pathologists who selectively observe specific regions of histopathology tissues under different magnifications, most methods do not incorporate multiple resolutions of the WSIs, hierarchically and attentively, thereby leading to a loss of focus on the WSIs and information from other resolutions. To resolve this issue, we propose the Hierarchical Attention-Guided Multiple Instance Learning framework to fully exploit the WSIs, which can dynamically and attentively discover the discriminative regions across multiple resolutions of the WSIs. Within this framework, to further enhance the performance of the transformer and obtain a more holistic WSI (bag) representation, we propose an Integrated Attention Transformer, consisting of multiple Integrated Attention Modules, which is the combination of a transformer layer and an aggregation module that produces a bag representation based on every instance representation in that bag. The results of the experiments show that our method achieved state-of-the-art performances on multiple datasets, including Camelyon16, TCGA-RCC, TCGA-NSCLC, and our in-house IMGC dataset.

List of keywords

Computer Vision -> CV: Biomedical image analysis
Machine Learning -> ML: Attention models
Machine Learning -> ML: Weakly supervised learning

5292

Optimal Decision Tree Policies for Markov Decision Processes

Daniël Vos, Sicco Verwer

[+] More

[-] Less

Interpretability of reinforcement learning policies is essential for many real-world tasks but learning such interpretable policies is a hard problem. Particularly rule-based policies such as decision trees and rules lists are difficult to optimize due to their non-differentiability. While existing techniques can learn verifiable decision tree policies there is no guarantee that the learners generate a decision that performs optimally. In this work, we study the optimization of size-limited decision trees for Markov Decision Processes (MPDs) and propose OMDTs: Optimal MDP Decision Trees. Given a user-defined size limit and MDP formulation OMDT directly maximizes the expected discounted return for the decision tree using Mixed-Integer Linear Programming. By training optimal decision tree policies for different MDPs we empirically study the optimality gap for existing imitation learning techniques and find that they perform sub-optimally. We show that this is due to an inherent shortcoming of imitation learning, namely that complex policies cannot be represented using size-limited trees. In such cases, it is better to directly optimize the tree for expected return. While there is generally a tradeoff between the performance and interpretability of machine learning models, we find that OMDTs limited to a depth of 3 often perform close to the optimal limit.

List of keywords

Planning and Scheduling -> PS: Markov decisions processes
Machine Learning -> ML: Explainable/Interpretable machine learning
Search -> S: Combinatorial search and optimisation

5293

Proportionality Guarantees in Elections with Interdependent Issues

Markus Brill, Evangelos Markakis, Georgios Papasotiropoulos, Jannik Peters

[+] More

[-] Less

We consider a multi-issue election setting over a set of possibly interdependent issues with the goal of achieving proportional representation of the views of the electorate. To this end, we employ a proportionality criterion suggested by Skowron and Gorecki [2022], that guarantees fair representation for all groups of voters of sufficient size. For this criterion, there exist rules that perform well in the case where all the issues have a binary domain and are independent of each other. In particular, this has been shown for Proportional Approval Voting (PAV) and for the Method of Equal Shares (MES). In this paper, we go two steps further: we generalize these guarantees for issues with a non-binary domain, and, most importantly, we consider extensions to elections with dependencies among issues, where we identify restrictions that lead to analogous results. To achieve this, we define appropriate generalizations of PAV and MES to handle conditional ballots. In addition to proportionality considerations, we also examine the computational properties of the conditional version of MES. Our findings indicate that the conditional case poses additional challenges and differs significantly from the unconditional one, both in terms of proportionality guarantees and computational complexity.

List of keywords

Game Theory and Economic Paradigms -> GTEP: Computational social choice

5308

Can I Really Do That? Verification of Meta-Operators via Stackelberg Planning

Florian Pham, Alvaro Torralba

[+] More

[-] Less

Macro-operators are a common reformulation method in planning, by compactly describing the preconditions and effects of a sequence of operators. for achieving the goals of the domain. We introduce meta-operators, which allow for using different sequences of actions in each state. We introduce a tool that can automatically verify whether a meta-operator is valid, i.e., the represented behavior is always doable. We check this for all instantiations of the meta-operator and all reachable states via a compilation into Stackelberg planning, a form of adversarial planning. Our results show that meta-operators learned for multiple domains can often express behaviors in a more compact way than standard macro-operators, potentially improving planners’ performance.

List of keywords

Planning and Scheduling -> PS: Planning algorithms
Planning and Scheduling -> PS: Learning in planning and scheduling
Planning and Scheduling -> PS: Search in planning and scheduling

5311

The First Proven Performance Guarantees for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) on a Combinatorial Optimization Problem

Sacha Cerf, Benjamin Doerr, Benjamin Hebras, Yakob Kahane, Simon Wietheger

[+] More

[-] Less

The Non-dominated Sorting Genetic Algorithm-II (\NSGA) is one of the most prominent algorithms to solve multi-objective optimization problems. Recently, the first mathematical runtime guarantees have been obtained for this algorithm, however only for artificial benchmark problems. In this work, we give the first proven performance guarantees for a classic optimization problem, the NP-complete bi-objective minimum spanning tree problem. More specifically, we show that the \NSGA with population size $N \ge 4((n-1)\wmax + 1)$ computes all extremal points of the Pareto front in an expected number of $O(m^2 n \wmax \log(n\wmax))$ iterations, where $n$ is the number of vertices, $m$ the number of edges, and $\wmax$ is the maximum edge weight in any objective function. This result confirms, via mathematical means, the good performance of the \NSGA observed empirically. It also shows that mathematical analyses of this algorithm are not only possible for simple artificial benchmark problems, but also for more complex combinatorial optimization problems. As a side result, with a subset of our arguments, we also obtain a new analysis of the performance of the simple global SEMO algorithm on the bi-objective minimum spanning tree problem, which improves the previous best result by a factor of $|F|$, the number of extremal points of the Pareto front, a set that can be as large as $n\wmax$. The main reason for this improvement is our observation that both multi-objective evolutionary algorithms find the different extremal points in parallel rather than sequentially, as assumed in the previous proofs.

List of keywords

Search -> S: Evolutionary computation
Search -> S: Heuristic search

5323

Sampling Ex-Post Group-Fair Rankings

Sruthi Gorantla, Amit Deshpande, Anand Louis

[+] More

[-] Less

Randomized rankings have been of recent interest to achieve ex-ante fairer exposure and better robustness than deterministic rankings. We propose a set of natural axioms for randomized group-fair rankings and prove that there exists a unique distribution $\cD$ that satisfies our axioms and is supported only over ex-post group-fair rankings, i.e., rankings that satisfy given lower and upper bounds on group-wise representation in the top-$k$ ranks. Our problem formulation works even when there is implicit bias, incomplete relevance information, or only ordinal ranking is available instead of relevance scores or utility values. We propose two algorithms to sample a random group-fair ranking from the distribution $\cD$ mentioned above. Our first dynamic programming-based algorithm samples ex-post group-fair rankings uniformly at random in time $O(k^2\ell)$, where $\ell$ is the number of groups. Our second random walk-based algorithm samples ex-post group-fair rankings from a distribution $\epsilon$-close to $\cD$ in total variation distance and has expected running time $O^*(k^2\ell^2)$\footnote{$O^*$ suppresses logarithmic and error terms.}, when there is a sufficient gap between the given upper and lower bounds on the group-wise representation. The former does exact sampling, but the latter runs significantly faster on real-world data sets for larger values of $k$. We give empirical evidence that our algorithms compare favorably against recent baselines for fairness and ranking utility on real-world data sets.

List of keywords

AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
AI Ethics, Trust, Fairness -> ETF: Bias
Search -> S: Combinatorial search and optimisation

5367

FedBFPT: An Efficient Federated Learning Framework for Bert Further Pre-training

Xin’ao Wang, Huan Li, Ke Chen, Lidan Shou

[+] More

[-] Less

This study proposes FedBFPT(Federated \Bert{} Further Pre-Training), a Federated Learning (FL) framework for further pre-training the Bert language model in specialized domains while addressing privacy concerns. FedBFPT enables multiple clients to collaboratively train the shallower layers of Bert, which are more important in the further pre-training stage, without the need to share private data. To achieve this, FedBFPT involves building a local model at each client, progressively training the shallow layers of local models while sampling deep layers, and aggregating trained parameters on a server to generate the final global model. This approach utilizes multiple smaller local models to further pre-train a global model targeted at specific tasks via fine-tuning, resulting in a reduction in resource usage while maintaining model accuracy. Theoretical analysis is conducted to support the efficiency of FedBFPT, and experiments are conducted on corpora across domains such as medicine, biology, and computer science. Results indicate that FedBFPT achieves performance levels comparable to traditional FL methods while reducing computation and communication costs by 46.70\% and 7.04\%, respectively, even approaching the performance of centralized training models.

List of keywords

Natural Language Processing -> NLP: Applications
Machine Learning -> ML: Federated learning

3151

Learning Dissemination Strategies for External Sources in Opinion Dynamic Models with Cognitive Biases

Abdullah Al Maruf, Luyao Niu, Bhaskar Ramasubramanian, Andrew Clark, Radha Poovendran

[+] More

[-] Less

The opinions of members of a population are influenced by opinions of their peers, their own predispositions, and information from external sources via one or more information channels (e.g., news, social media). Due to individual cognitive biases, the perceptual impact of and importance assigned by agents to information on each channel can be different. In this paper, we propose a model of opinion evolution that uses prospect theory to represent perception of information from the external source along each channel. Our prospect-theoretic model reflects traits observed in humans such as loss aversion, assigning inflated (deflated) values to low (high) probability events, and evaluating outcomes relative to an individually known reference point. We consider the problem of determining information dissemination strategies for the external source to adopt in order to drive opinions of individuals towards a desired value. However, computing a strategy faces a challenge that agents’ initial predispositions and functions characterizing their perceptions of information disseminated might be unknown. We overcome this challenge by using Gaussian process learning to estimate these unknown parameters. When the external source sends information over multiple channels, the problem of jointly selecting optimal dissemination strategies is in general, combinatorial. We prove that this problem is submodular, and design near-optimal dissemination algorithms. We evaluate our model on three different widely used large graphs that represent real-world social interactions. Our results indicate that the external source can effectively drive opinions towards a desired value when using prospect-theory based dissemination strategies.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Other

234

Learning to Send Reinforcements: Coordinating Multi-Agent Dynamic Police Patrol Dispatching and Rescheduling via Reinforcement Learning

Waldy Joe, Hoong Chuin Lau

[+] More

[-] Less

We address the problem of coordinating multiple agents in a dynamic police patrol scheduling via a Reinforcement Learning (RL) approach. Our approach utilizes Multi-Agent Value Function Approximation (MAVFA) with a rescheduling heuristic to learn dispatching and rescheduling policies jointly. Often, police operations are divided into multiple sectors for more effective and efficient operations. In a dynamic setting, incidents occur throughout the day across different sectors, disrupting initially-planned patrol schedules. To maximize policing effectiveness, police agents from different sectors cooperate by sending reinforcements to support one another in their incident response and even routine patrol. This poses an interesting research challenge on how to make such complex decision of dispatching and rescheduling involving multiple agents in a coordinated fashion within an operationally reasonable time. Unlike existing Multi-Agent RL (MARL) approaches which solve similar problems by either decomposing the problem or action into multiple components, our approach learns the dispatching and rescheduling policies jointly without any decomposition step. In addition, instead of directly searching over the joint action space, we incorporate an iterative best response procedure as a decentralized optimization heuristic and an explicit coordination mechanism for a scalable and coordinated decision-making. We evaluate our approach against the commonly adopted two-stage approach and conduct a series of ablation studies to ascertain the effectiveness of our proposed learning and coordination mechanisms.

List of keywords

Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Planning and Scheduling -> PS: Learning in planning and scheduling
Planning and Scheduling -> PS: Applications

4287

Learning When to Advise Human Decision Makers

Gali Noti, Yiling Chen

[+] More

[-] Less

Artificial intelligence (AI) systems are increasingly used for providing advice to facilitate human decision making in a wide range of domains, such as healthcare, criminal justice, and finance. Motivated by limitations of the current practice where algorithmic advice is provided to human users as a constant element in the decision-making pipeline, in this paper we raise the question of when should algorithms provide advice? We propose a novel design of AI systems in which the algorithm interacts with the human user in a two-sided manner and aims to provide advice only when it is likely to be beneficial for the user in making their decision. The results of a large-scale experiment show that our interactive advising approach manages to provide advice at times of need and to significantly improve human decision making compared to fixed, non-interactive, advising approaches. This approach has additional advantages in facilitating human learning, preserving complementary strengths of human decision makers, and leading to more positive responsiveness to the advice.

List of keywords

Humans and AI -> HAI: Human-AI collaboration
Humans and AI -> HAI: Applications

141

Causal-Based Supervision of Attention in Graph Neural Network: A Better and Simpler Choice towards Powerful Attention

Hongjun Wang, Jiyuan Chen, Lun Du, Qiang Fu, Shi Han, Xuan Song

[+] More

[-] Less

Recent years have witnessed the great potential of attention mechanism in graph representation learning. However, while variants of attention-based GNNs are setting new benchmarks for numerous real-world datasets, recent works have pointed out that their induced attentions are less robust and generalizable against noisy graphs due to lack of direct supervision. In this paper, we present a new framework which utilizes the tool of causality to provide a powerful supervision signal for the learning process of attention functions. Specifically, we estimate the direct causal effect of attention to the final prediction, and then maximize such effect to guide attention attending to more meaningful neighbors. Our method can serve as a plug-and-play module for any canonical attention-based GNNs in an end-to-end fashion. Extensive experiments on a wide range of benchmark datasets illustrated that, by directly supervising attention functions, the model is able to converge faster with a clearer decision boundary, and thus yields better performances.

List of keywords