Accepted Papers List2

11
StockFormer: Learning Hybrid Trading Machines with Predictive Coding
Siyu Gao, Yunbo Wang, Xiaokang Yang
[+] More
[-] Less
Typical RL-for-finance solutions directly optimize trading policies over the noisy market data, such as stock prices and trading volumes, without explicitly considering the future trends and correlations of different investment assets as we humans do. In this paper, we present StockFormer, a hybrid trading machine that integrates the forward modeling capabilities of predictive coding with the advantages of RL agents in policy flexibility. The predictive coding part consists of three Transformer branches with modified structures, which respectively extract effective latent states of long-/short-term future dynamics and asset relations. The RL agent adaptively fuses these states and then executes an actor-critic algorithm in the unified state space. The entire model is jointly trained by propagating the critic’s gradients back to the predictive coding module. StockFormer significantly outperforms existing approaches across three publicly available financial datasets in terms of portfolio returns and Sharpe ratios.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Finance
Machine Learning -> ML: Deep reinforcement learning
40
PathLAD+: An Improved Exact Algorithm for Subgraph Isomorphism Problem
Yiyuan Wang, Chenghou Jin, Shaowei Cai, Qingwei Lin
[+] More
[-] Less
The subgraph isomorphism problem (SIP) is a challenging problem with wide practical applications. In the last decade, despite being a theoretical hard problem, researchers design various algorithms for solving SIP. In this work, we propose three main heuristics and develop an improved exact algorithm for SIP. First, we design a probing search procedure to try whether the search procedure can successfully obtain a solution at first sight. Second, we design a novel matching ordering as a value-ordering heuristic, which uses some useful information obtained from the probing search procedure to preferentially select some promising target vertices. Third, we discuss the characteristics of different propagation methods in the context of SIP and present an adaptive propagation method to make a good balance between these methods. Experimental results on a broad range of real-world benchmarks show that our proposed algorithm performs better than state-of-the-art algorithms for the SIP.
List of keywords
Search -> S: Combinatorial search and optimisation
Search -> S: Heuristic search
53
Non-Obvious Manipulability in Extensive-Form Mechanisms: the Revelation Principle for Single-Parameter Agents
Thomas Archbold, Bart de Keijzer, Carmine Ventre
[+] More
[-] Less
Recent work in algorithmic mechanism design focuses on designing mechanisms for agents with \emph{bounded rationality}, modifying the constraints that must be satisfied in order to achieve incentive compatibility. Starting with Li’s strengthening of strategyproofness, \emph{obvious strategyproofness (OSP)} requires truthtelling to be “obvious” over dishonesty, roughly meaning that the worst outcome from truthful actions must be no worse than the best outcome for dishonest ones. A celebrated result for dominant-strategy incentive-compatible mechanisms that allows us to restrict attention to direct mechanisms, known as the \emph{revelation principle}, does not hold for OSP: the implementation details matter for the obvious incentive properties of the mechanism. Studying agent strategies in real-life mechanisms, Troyan and Morrill introduce a relaxation of strategyproofness known as non-obvious manipulability, which only requires comparing certain extrema of the agents’ utility functions in order for a mechanism to be incentive-compatible. Specifically a mechanism is \emph{not obviously manipulable (NOM)} if the best and worst outcomes when acting truthfully are no worse than the best and worst outcomes when acting dishonestly. In this work we first extend the cycle monotonicity framework for direct-revelation NOM mechanism design to indirect mechanisms. We then apply this to two settings, single-parameter agents and mechanisms for two agents in which one has a two-value domain, and show that under these models the revelation principle holds: direct mechanisms are just as powerful as indirect ones.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Mechanism design
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Multidisciplinary Topics and Applications -> MDA: Economics
58
HireVAE: an Online and Adaptive Factor Model based on Hierarchical and Regime-Switch VAE
Zikai Wei, Anyi Rao, Bo Dai, Dahua Lin
[+] More
[-] Less
Factor model is a fundamental investment tool in quantitative investment, which can be empowered by deep learning to become more flexible and efficient in practical complicated investing situations. However, it is still an open question to build a factor model that can conduct stock prediction in an online and adaptive setting, where the model can adapt itself to match current market regime identified based on only point-in-time market information. To tackle this problem, we propose the first deep learning based online and adaptive factor model, HireVAE, at the core of which is a hierarchical latent space that embeds the underlying relationship between global market situation and stock-wise latent factors, so that HireVAE can effectively estimate useful latent factors given only historical market information and subsequently predict accurate stock returns. Across four commonly used real stock market benchmarks, the proposed HireVAE demonstrate superior performance in terms of active returns over previous methods, verifying the potential of such online and adaptive factor model.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Finance
Machine Learning -> ML: Applications
71
Teaching What You Should Teach: A Data-Based Distillation Method
Shitong Shao, Huanran Chen, Zhen Huang, Linrui Gong, Shuai Wang, Xinxiao Wu
[+] More
[-] Less
In real teaching scenarios, an excellent teacher always teaches what he (or she) is good at but the student is not. This gives the student the best assistance in making up for his (or her) weaknesses and becoming a good one overall. Enlightened by this, we introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework, and propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally. To be specific, we design a neural network-based data augmentation module with priori bias to find out what meets the teacher’s strengths but the student’s weaknesses, by learning magnitudes and probabilities to generate suitable data samples. By training the data augmentation module and the generalized distillation paradigm alternately, a student model is learned with excellent generalization ability. To verify the effectiveness of our method, we conducted extensive comparative experiments on object recognition, detection, and segmentation tasks. The results on the CIFAR-100, ImageNet-1k, MS-COCO, and Cityscapes datasets demonstrate that our method achieves state-of-the-art performance on almost all teacher-student pairs. Furthermore, we conduct visualization studies to explore what magnitudes and probabilities are needed for the distillation process.
List of keywords
Computer Vision -> CV: Applications
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Segmentation
84
Adversarial Amendment is the Only Force Capable of Transforming an Enemy into a Friend
Chong Yu, Tao Chen, Zhongxue Gan
[+] More
[-] Less
Adversarial attack is commonly regarded as a huge threat to neural networks because of misleading behavior. This paper presents an opposite perspective: adversarial attacks can be harnessed to improve neural models if amended correctly. Unlike traditional adversarial defense or adversarial training schemes that aim to improve the adversarial robustness, the proposed adversarial amendment (AdvAmd) method aims to improve the original accuracy level of neural models on benign samples. We thoroughly analyze the distribution mismatch between the benign and adversarial samples. This distribution mismatch and the mutual learning mechanism with the same learning ratio applied in prior art defense strategies is the main cause leading the accuracy degradation for benign samples. The proposed AdvAmd is demonstrated to steadily heal the accuracy degradation and even leads to a certain accuracy boost of common neural models on benign classification, object detection, and segmentation tasks. The efficacy of the AdvAmd is contributed by three key components: mediate samples (to reduce the influence of distribution mismatch with a fine-grained amendment), auxiliary batch norm (to solve the mutual learning mechanism and the smoother judgment surface), and AdvAmd loss (to adjust the learning ratios according to different attack vulnerabilities) through quantitative and ablation experiments.
List of keywords
Machine Learning -> ML: Robustness
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Constraint Satisfaction and Optimization -> CSO: Applications
86
TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition
Tianlun Zheng, Zhineng Chen, Jinfeng Bai, Hongtao Xie, Yu-Gang Jiang
[+] More
[-] Less
Text irregularities pose significant challenges to scene text recognizers. Thin-Plate Spline (TPS)-based rectification is widely regarded as an effective means to deal with them. Currently, the calculation of TPS transformation parameters purely depends on the quality of regressed text borders. It ignores the text content and often leads to unsatisfactory rectified results for severely distorted text. In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time. TPS++ formulates the parameter calculation as a joint process of foreground control point regression and content-based attention score estimation, which is computed by a dedicated designed gated-attention block. TPS++ builds a more flexible content-aware rectifier, generating a natural text correction that is easier to read by the subsequent recognizer. Moreover, TPS++ shares the feature backbone with the recognizer in part and implements the rectification at feature-level rather than image-level, incurring only a small overhead in terms of parameters and inference time. Experiments on public benchmarks show that TPS++ consistently improves the recognition and achieves state-of-the-art accuracy. Meanwhile, it generalizes well on different backbones and recognizers. Code is at https://github.com/simplify23/TPS_PP.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Scene analysis and understanding   
87
Self-supervised Graph Disentangled Networks for Review-based Recommendation
Yuyang Ren, Haonan Zhang, Qi Li, Luoyi Fu, Xinbing Wang, Chenghu Zhou
[+] More
[-] Less
User review data is considered as auxiliary information to alleviate the data sparsity problem and improve the quality of learned user/item or interaction representations in review-based recommender systems. However, existing methods usually model user-item interactions in a holistic manner and neglect the entanglement of the latent intents behind them, e.g., price, quality, or appearance, resulting in suboptimal representations and reducing interpretability. In this paper, we propose a Self-supervised Graph Disentangled Networks for review-based recommendation (SGDN), to separately model the user-item interactions based on the latent factors through the textual review data. To this end, we first model the distributions of interactions over latent factors from both semantic information in review data and structural information in user-item graph data, forming several factor graphs. Then a factorized message passing mechanism is designed to learn disentangled user/item and interaction representations on the factor graphs. Finally, we set an intent-aware contrastive learning task to alleviate the sparsity issue and encourage disentanglement through dynamically identifying positive and negative samples based on the learned intent distributions. Empirical results over five benchmark datasets validate the superiority of SGDN over the state-of-the-art methods and the interpretability of learned intent factors.
List of keywords
Data Mining -> DM: Recommender systems
Data Mining -> DM: Collaborative filtering
100
Understanding the Generalization Ability of Deep Learning Algorithms: A Kernelized Rényi’s Entropy Perspective
Yuxin Dong, Tieliang Gong, Hong Chen, Chen Li
[+] More
[-] Less
Recently, information-theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient / Langevin descent (SGD/SGLD) learning algorithms without strong assumptions such as Lipschitz or convexity conditions. However, the current generalization error bounds within this framework are still far from optimal, while substantial improvements on these bounds are quite challenging due to the intractability of high-dimensional information quantities. To address this issue, we first propose a novel information theoretical measure: kernelized Rényi’s entropy, by utilizing operator representation in Hilbert space. It inherits the properties of Shannon’s entropy and can be effectively calculated via simple random sampling, while remaining independent of the input dimension. We then establish the generalization error bounds for SGD/SGLD under kernelized Rényi’s entropy, where the mutual information quantities can be directly calculated, enabling evaluation of the tightness of each intermediate step. We show that our information-theoretical bounds depend on the statistics of the stochastic gradients evaluated along with the iterates, and are rigorously tighter than the current state-of-the-art (SOTA) results. The theoretical findings are also supported by large-scale empirical studies.
List of keywords
Machine Learning -> ML: Theory of deep learning
Machine Learning -> ML: Learning theory
108
A regular matching constraint for string variables
Roberto Amadini, Peter Stuckey
[+] More
[-] Less
Using a regular language as a pattern for string matching is nowadays a common—and sometimes unsafe—operation, provided as a built-in feature by most programming languages. A proper constraint solver over string variables should thus support most of the operations over regular expressions and related constructs. However, state-of-the-art string solvers natively support only the membership relation of a string variable to a regular language. Here we take a step forward by defining a specialised propagator for the match operation, returning the leftmost position where a pattern can match a given string. Empirical evidences show the effectiveness of our approach, implemented within the constraint programming framework, and tested against state-of-the-art string solvers.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Constraint programming
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
109
Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning
Sen Zhao, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen
[+] More
[-] Less
Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation. In each turn of CRS, there naturally have two decision-making processes with different roles that influence each other: 1) director, which is to select the follow-up option (i.e., ask or recommend) that is more effective for reducing the action space and acquiring user preferences; and 2) actor, which is to accordingly choose primitive actions (i.e., asked attribute or recommended item) to estimate the effectiveness of the director’s option. However, existing methods heavily rely on a unified decision-making module or heuristic rules, while neglecting to distinguish the roles of different decision procedures, as well as the mutual influences between them. To address this, we propose a novel Director-Actor Hierarchical Conversational Recommender (DAHCR), where the director selects the most effective option, followed by the actor accordingly choosing primitive actions that satisfy user preferences. Specifically, we develop a dynamic hypergraph to model user preferences and introduce an intrinsic motivation to train from weak supervision over the director. Finally, to alleviate the bad effect of model bias on the mutual influence between the director and actor, we model the director’s option by sampling from a categorical distribution. Extensive experiments demonstrate that DAHCR outperforms state-of-the-art methods.
List of keywords
Data Mining -> DM: Recommender systems
133
Timestamp-Supervised Action Segmentation from the Perspective of Clustering
Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Fuchun Sun
[+] More
[-] Less
Video action segmentation under timestamp supervision has recently received much attention due to lower annotation costs. Most existing methods generate pseudo-labels for all frames in each video to train the segmentation model. However, these methods suffer from incorrect pseudo-labels, especially for the semantically unclear frames in the transition region between two consecutive actions, which we call ambiguous intervals. To address this issue, we propose a novel framework from the perspective of clustering, which includes the following two parts. First, pseudo-label ensembling generates incomplete but high-quality pseudo-label sequences, where the frames in ambiguous intervals have no pseudo-labels. Second, iterative clustering iteratively propagates the pseudo-labels to the ambiguous intervals by clustering, and thus updates the pseudo-label sequences to train the model. We further introduce a clustering loss, which encourages the features of frames within the same action segment more compact. Extensive experiments show the effectiveness of our method.
List of keywords
Computer Vision -> CV: Video analysis and understanding   
Computer Vision -> CV: Applications
Computer Vision -> CV: Machine learning for vision
162
Decoupling with Entropy-based Equalization for Semi-Supervised Semantic Segmentation
Chuanghao Ding, Jianrong Zhang, Henghui Ding, Hongwei Zhao, Zhihui Wang, Tengfei Xing, Runbo Hu
[+] More
[-] Less
Semi-supervised semantic segmentation methods are the main solution to alleviate the problem of high annotation consumption in semantic segmentation. However, the class imbalance problem makes the model favor the head classes with sufficient training samples, resulting in poor performance of the tail classes. To address this issue, we propose a Decoupled Semi-Supervise Semantic Segmentation (DeS4) framework based on the teacher-student model. Specifically, we first propose a decoupling training strategy to split the training of the encoder and segmentation decoder, aiming at a balanced decoder. Then, a non-learnable prototype-based segmentation head is proposed to regularize the category representation distribution consistency and perform a better connection between the teacher model and the student model. Furthermore, a Multi-Entropy Sampling (MES) strategy is proposed to collect pixel representation for updating the shared prototype to get a class-unbiased head. We conduct extensive experiments of the proposed DeS4 on two challenging benchmarks (PASCAL VOC 2012 and Cityscapes) and achieve remarkable improvements over the previous state-of-the-art methods.
List of keywords
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
Computer Vision -> CV: Scene analysis and understanding   
Computer Vision -> CV: Segmentation
169
MM-PCQA: Multi-Modal Learning for No-reference Point Cloud Quality Assessment
Zicheng Zhang, Wei Sun, Xiongkuo Min, Qiyuan Wang, Jun He, Quan Zhou, Guangtao Zhai
[+] More
[-] Less
The visual quality of point clouds has been greatly emphasized since the ever-increasing 3D vision applications are expected to provide cost-effective and high-quality experiences for users. Looking back on the development of point cloud quality assessment (PCQA), the visual quality is usually evaluated by utilizing single-modal information, i.e., either extracted from the 2D projections or 3D point cloud. The 2D projections contain rich texture and semantic information but are highly dependent on viewpoints, while the 3D point clouds are more sensitive to geometry distortions and invariant to viewpoints. Therefore, to leverage the advantages of both point cloud and projected image modalities, we propose a novel no-reference Multi-Modal Point Cloud Quality Assessment (MM-PCQA) metric. In specific, we split the point clouds into sub-models to represent local geometry distortions such as point shift and down-sampling. Then we render the point clouds into 2D image projections for texture feature extraction. To achieve the goals, the sub-models and projected images are encoded with point-based and image-based neural networks. Finally, symmetric cross-modal attention is employed to fuse multi-modal quality-aware information. Experimental results show that our approach outperforms all compared state-of-the-art methods and is far ahead of previous no-reference PCQA methods, which highlights the effectiveness of the proposed method. The code is available at https://github.com/zzc-1998/MM-PCQA.
List of keywords
Computer Vision -> CV: 3D computer vision
Machine Learning -> ML: Multi-modal learning
174
Scalable Communication for Multi-Agent Reinforcement Learning via Transformer-Based Email Mechanism
Xudong Guo, Daming Shi, Wenhui Fan
[+] More
[-] Less
Communication can impressively improve cooperation in multi-agent reinforcement learning (MARL), especially for partially-observed tasks. However, existing works either broadcast the messages leading to information redundancy, or learn targeted communication by modeling all the other agents as targets, which is not scalable when the number of agents varies. In this work, to tackle the scalability problem of MARL communication for partially-observed tasks, we propose a novel framework Transformer-based Email Mechanism (TEM). The agents adopt local communication to send messages only to the ones that can be observed without modeling all the agents. Inspired by human cooperation with email forwarding, we design message chains to forward information to cooperate with the agents outside the observation range. We introduce Transformer to encode and decode the message chain to choose the next receiver selectively. Empirically, TEM outperforms the baselines on multiple cooperative MARL benchmarks. When the number of agents varies, TEM maintains superior performance without further training.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Agent communication
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Machine Learning -> ML: Deep reinforcement learning
Robotics -> ROB: Multi-robot systems
193
SF-PATE: Scalable, Fair, and Private Aggregation of Teacher Ensembles
Cuong Tran, Keyu Zhu, Ferdinando Fioretto, Pascal Van Hentenryck
[+] More
[-] Less
A critical concern in data-driven processes is to build models whose outcomes do not discriminate against some demographic groups, including gender, ethnicity, or age. In learning tasks, knowledge of the group attributes is essential to ensure non6 discrimination, but in practice, these attributes may not be available due to legal and ethical requirements. To address this challenge, this paper studies a model that protects the privacy of the individuals’ sensitive information while also allowing it to learn non-discriminatory predictors. A key feature of the proposed model is to enable the use of off-the-shelves and non-private fair models to create a privacy-preserving and fair model. The paper analyzes the relation between accuracy, privacy, and fairness, and assess the benefits of the proposed models on several prediction tasks. In particular, this proposal allows both scalable and accurate training of private and fair models for very large neural networks.
List of keywords
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Data Mining -> DM: Privacy-preserving data mining
Machine Learning -> ML: Multi-task and transfer learning
194
MILD: Modeling the Instance Learning Dynamics for Learning with Noisy Labels
ChuanYang Hu, Shipeng Yan, Zhitong Gao, Xuming He
[+] More
[-] Less
Despite deep learning has achieved great success, it often relies on a large amount of training data with accurate labels, which are expensive and time-consuming to collect. A prominent direction to reduce the cost is to learn with noisy labels, which are ubiquitous in the real-world applications. A critical challenge for noisy image classification is to reduce the effect of network memorization on the false6 labeled data. In this work, we propose an iterative selection approach capable of identifying clean data by considering the overall learning dynamics of each data instance. Different from the previous small-loss heuristics, we leverage the observation that deep network is easy to memorize and hard to forget clean data. In particular, we measure the difficulty of memorization and forgetting for each instance via the transition times from misclassified to memorized and from memorized to misclassified in training, respectively. Then, we integrate them as the criterion for selection. Based on the proposed new criterion, we retain a subset of identified clean data and repeat the selection procedure to iteratively refine the clean subset. To validate our method, we perform exhaustive experiments on synthetic noisy datasets and real-world web data, and our strategy outperforms existing noisy-label learning methods.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Machine learning for vision
199
Approximate Envy-Freeness in Graphical Cake Cutting
Sheung Man Yuen, Warut Suksompong
[+] More
[-] Less
We study the problem of fairly allocating a divisible resource in the form of a graph, also known as graphical cake cutting. Unlike for the canonical interval cake, a connected envy-free allocation is not guaranteed to exist for a graphical cake. We focus on the existence and computation of connected allocations with low envy. For general graphs, we show that there is always a 1/2-additive-envy-free allocation and, if the agents’ valuations are identical, a (2+\epsilon)-multiplicative-envy-free allocation for any \epsilon > 0. In the case of star graphs, we obtain a multiplicative factor of 3+\epsilon for arbitrary valuations and 2 for identical valuations. We also derive guarantees when each agent can receive more than one connected piece. All of our results come with efficient algorithms for computing the respective allocations.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Fair division
Game Theory and Economic Paradigms -> GTEP: Computational social choice
200
LISSNAS: Locality-based Iterative Search Space Shrinkage for Neural Architecture Search
Bhavna Gopal, Arjun Sridhar, Tunhou Zhang, Yiran Chen
[+] More
[-] Less
Search spaces hallmark the advancement of Neural Architecture Search (NAS). Large and complex search spaces with versatile building operators and structures provide more opportunities to brew promising architectures, yet pose severe challenges on efficient exploration and exploitation. Subsequently, several search space shrinkage methods optimize by selecting a single sub-region that contains some well-performing networks. Small performance and efficiency gains are observed with these methods but such techniques leave room for significantly improved search performance and are ineffective at retaining architectural diversity. We propose LISSNAS, an automated algorithm that shrinks a large space into a diverse, small search space with SOTA search performance. Our approach leverages locality, the relationship between structural and performance similarity, to efficiently extract many pockets of well-performing networks. We showcase our method on an array of search spaces spanning various sizes and datasets. We accentuate the effectiveness of our shrunk spaces when used in one-shot search by achieving the best Top-1 accuracy in two different search spaces. Our method achieves a SOTA Top-1 accuracy of 77.6% in ImageNet under mobile constraints, best-in-class Kendal-Tau, architectural diversity, and search space size.
List of keywords
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Automated machine learning
203
FedSampling: A Better Sampling Strategy for Federated Learning
Tao Qi, Fangzhao Wu, Lingjuan Lyu, Yongfeng Huang, Xing Xie
[+] More
[-] Less
Federated learning (FL) is an important technique for learning models from decentralized data in a privacy-preserving way. Existing FL methods usually uniformly sample clients for local model learning in each round. However, different clients may have significantly different data sizes, and the clients with more data cannot have more opportunities to contribute to model training, which may lead to inferior performance. In this paper, instead of client uniform sampling, we propose a novel data uniform sampling strategy for federated learning (FedSampling), which can effectively improve the performance of federated learning especially when client data size distribution is highly imbalanced across clients. In each federated learning round, local data on each client is randomly sampled for local model learning according to a probability based on the server desired sample size and the total sample size on all available clients. Since the data size on each client is privacy-sensitive, we propose a privacy-preserving way to estimate the total sample size with a differential privacy guarantee. Experiments on four benchmark datasets show that FedSampling can effectively improve the performance of federated learning.
List of keywords
Machine Learning -> ML: Federated learning
Data Mining -> DM: Privacy-preserving data mining
204
A Rigorous Risk-aware Linear Approach to Extended Markov Ratio Decision Processes with Embedded Learning
Alexander Zadorojniy, Takayuki Osogami, Orit Davidovich
[+] More
[-] Less
We consider the problem of risk-aware Markov Decision Processes (MDPs) for Safe AI. We introduce a theoretical framework, Extended Markov Ratio Decision Processes (EMRDP), that incorporates risk into MDPs and embeds environment learning into this framework. We propose an algorithm to find the optimal policy for EMRDP with theoretical guarantees. Under a certain monotonicity assumption, this algorithm runs in strongly-polynomial time both in the discounted and expected average reward models. We validate our algorithm empirically on a Grid World benchmark, evaluating its solution quality, required number of steps, and numerical stability. We find its solution quality to be stable under data noising, while its required number of steps grows with added noise. We observe its numerical stability compared to global methods.
List of keywords
Planning and Scheduling -> PS: Markov decisions processes
Machine Learning -> ML: Reinforcement learning
Uncertainty in AI -> UAI: Sequential decision making
222
Shhh! The Logic of Clandestine Operations
Pavel Naumov, Oliver Orejola
[+] More
[-] Less
An operation is called covert if it conceals the identity of the actor; it is called clandestine if the very fact that the operation is conducted is concealed. The paper proposes a formal semantics of clandestine operations and introduces a sound and complete logical system that describes the interplay between the distributed knowledge modality and a modality capturing coalition power to conduct clandestine operations.
List of keywords
Knowledge Representation and Reasoning -> KRR: Reasoning about knowledge and belief
Knowledge Representation and Reasoning -> KRR: Reasoning about actions
223
Appearance Prompt Vision Transformer for Connectome Reconstruction
Rui Sun, Naisong Luo, Yuwen Pan, Huayu Mai, Tianzhu Zhang, Zhiwei Xiong, Feng Wu
[+] More
[-] Less
Neural connectivity reconstruction aims to understand the function of biological reconstruction, and promotes basic scientific research. The intricate morphology and densely intertwined branches makes it an extremely challenging task. Most previous best-performing methods adopt affinity learning or metric learning. Nevertheless, they either neglect to model explicit voxel semantics caused by implicit optimization or are hysteresis to spatial information. Furthermore, the inherent locality of 3D CNNs limit modeling long-range dependencies and leading to sub-optimal results. In this work, we propose a coherent and unified Appearance Prompt Vision Transformer (APViT) to integrate affinity and metric learning to exploit the complementarity by learning long-range spatial dependencies. The proposed APViT enjoys several merits. First, the extension continuity-aware attention module aims at constructing hierarchical attention customized for neuron extensibility and slice continuity to learn instance voxel semantic context from a global perspective and utilize continuity priors to enhance voxel spatial awareness. Second, the appearance prompt modulator is responsible for leveraging voxel-adaptive appearance knowledge conditioned on affinity rich in spatial information to instruct instance voxel semantics, exploiting the potential of affinity learning to complement metric learning. Extensive experimental results on multiple challenging benchmarks demonstrate that our APViT achieves consistent improvements with huge flexibility under the same post-processing strategy.
List of keywords
Computer Vision -> CV: Biomedical image analysis
228
Adversarial Behavior Exclusion for Safe Reinforcement Learning
Md Asifur Rahman, Tongtong Liu, Sarra Alqahtani
[+] More
[-] Less
Learning by exploration makes reinforcement learning (RL) potentially attractive for many real-world applications. However, this learning process makes RL inherently too vulnerable to be used in real-world applications where safety is of utmost importance. Most prior studies consider exploration at odds with safety and thereby restrict it using either joint optimization of task and safety or imposing constraints for safe exploration. This paper migrates from the current convention to using exploration as a key to safety by learning safety as a robust behavior that completely excludes any behavioral pattern responsible for safety violations. Adversarial Behavior Exclusion for Safe RL (AdvEx-RL) learns a behavioral representation of the agent’s safety violations by approximating an optimal adversary utilizing exploration and later uses this representation to learn a separate safety policy that excludes those unsafe behaviors. In addition, AdvEx-RL ensures safety in a task-agnostic manner by acting as a safety firewall and therefore can be integrated with any RL task policy. We demonstrate the robustness of AdvEx-RL via comprehensive experiments in standard constrained Markov decision processes (CMDP) environments under 2 white-box action space perturbations as well as with changes in environment dynamics against 7 baselines. Consistently, AdvEx-RL outperforms the baselines by achieving an average safety performance of over 75% in the continuous action space with 10 times more variations in the testing environment dynamics. By using a standalone safety policy independent of conflicting objectives, AdvEx-RL also paves the way for interpretable safety behavior analysis as we show in our user study. This paper provides a novel study to investigate the robustness and interpretability of safe RL methods under deliberate perturbations.
List of keywords
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Machine Learning -> ML: Reinforcement learning
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
243
A Solution to Co-occurence Bias: Attributes Disentanglement via Mutual Information Minimization for Pedestrian Attribute Recognition
Yibo Zhou, Hai-Miao Hu, Jinzuo Yu, Zhenbo Xu, Weiqing Lu, Yuran Cao
[+] More
[-] Less
Recent studies on pedestrian attribute recognition progress with either explicit or implicit modeling of the co-occurence among attributes. Considering that this known a prior is highly variable and unforeseeable regarding the specific scenarios, we show that current methods can actually suffer in generalizing such fitted attributes interdependencies onto scenes or identities off the dataset distribution, resulting in the underlined bias of attributes co-occurence. To render models robust in realistic scenes, we propose the attributes-disentangled feature learning to ensure the recognition of an attribute not inferring on the existence of others, and which is sequentially formulated as a problem of mutual information minimization. Rooting from it, practical strategies are devised to efficiently decouple attributes, which substantially improve the baseline and establish state-of-the-art performance on realistic datasets like PETAzs and RAPzs.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Bias, fairness and privacy
251
Guided Patch-Grouping Wavelet Transformer with Spatial Congruence for Ultra-High Resolution Segmentation
Deyi Ji, Feng Zhao, Hongtao Lu
[+] More
[-] Less
Most existing ultra-high resolution (UHR) segmentation methods always struggle in the dilemma of balancing memory cost and local characterization accuracy, which are both taken into account in our proposed Guided Patch-Grouping Wavelet Transformer (GPWFormer) that achieves impressive performances. In this work, GPWFormer is a Transformer (T)-CNN (C) mutual leaning framework, where T takes the whole UHR image as input and harvests both local details and fine-grained long-range contextual dependencies, while C takes downsampled image as input for learning the category-wise deep context. For the sake of high inference speed and low computation complexity, T partitions the original UHR image into patches and groups them dynamically, then learns the low-level local details with the lightweight multi-head Wavelet Transformer (WFormer) network. Meanwhile, the fine-grained long-range contextual dependencies are also captured during this process, since patches that are far away in the spatial domain can also be assigned to the same group. In addition, masks produced by C are utilized to guide the patch grouping process, providing a heuristics decision. Moreover, the congruence constraints between the two branches are also exploited to maintain the spatial consistency among the patches. Overall, we stack the multi-stage process in a pyramid way. Experiments show that GPWFormer outperforms the existing methods with significant improvements on five benchmark datasets.
List of keywords
Computer Vision -> CV: Scene analysis and understanding   
Computer Vision -> CV: Segmentation
256
A Noisy-Label-Learning Formulation for Immune Repertoire Classification and Disease-Associated Immune Receptor Sequence Identification
Mingcai Chen, Yu Zhao, Zhonghuang Wang, Bing He, Jianhua Yao
[+] More
[-] Less
Immune repertoire classification, a typical multiple instance learning (MIL) problem, is a frontier research topic in computational biology that makes transformative contributions to new vaccines and immune therapies. However, the traditional instance-space MIL, directly assigning bag-level labels to instances, suffers from the massive amount of noisy labels and extremely low witness rate. In this work, we propose a noisy-label-learning formulation to solve the immune repertoire classification task. To remedy the inaccurate supervision of repertoire-level labels for a sequence-level classifier, we design a robust training strategy: The initial labels are smoothed to be asymmetric and are progressively corrected using the model’s predictions throughout the training process. Furthermore, two models with the same architecture but different parameter initialization are co-trained simultaneously to remedy the known “confirmation bias” problem in the self-training-like schema. As a result, we obtain accurate sequence-level classification and, subsequently, repertoire-level classification. Experiments on the Cytomegalovirus (CMV) and Cancer datasets demonstrate our method’s effectiveness and superior performance on sequence-level and repertoire-level tasks. Code available at https://github.com/TencentAILabHealthcare/NLL-IRC.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Bioinformatics
Multidisciplinary Topics and Applications -> MDA: Health and medicine
260
Generalization Bounds for Adversarial Metric Learning
Wen Wen, Han Li, Hong Chen, Rui Wu, Lingjuan Wu, Liangxuan Zhu
[+] More
[-] Less
Recently, adversarial metric learning has been proposed to enhance the robustness of the learned distance metric against adversarial perturbations. Despite rapid progress in validating its effectiveness empirically, theoretical guarantees on adversarial robustness and generalization are far less understood. To fill this gap, this paper focuses on unveiling the generalization properties of adversarial metric learning by developing the uniform convergence analysis techniques. Based on the capacity estimation of covering numbers, we establish the first high-probability generalization bounds with order O(n^{-1/2}) for adversarial metric learning with pairwise perturbations and general losses, where n is the number of training samples. Moreover, we obtain the refined generalization bounds with order O(n^{-1}) for the smooth loss by using local Rademacher complexity, which is faster than the previous result of adversarial pairwise learning, e.g., adversarial bipartite ranking. Experimental evaluation on real-world datasets validates our theoretical findings.
List of keywords
Machine Learning -> ML: Adversarial machine learning
Machine Learning -> ML: Learning theory
263
Improving LaCAM for Scalable Eventually Optimal Multi-Agent Pathfinding
Keisuke Okumura
[+] More
[-] Less
This study extends the recently-developed LaCAM algorithm for multi-agent pathfinding (MAPF). LaCAM is a sub-optimal search-based algorithm that uses lazy successor generation to dramatically reduce the planning effort. We present two enhancements. First, we propose its anytime version, called LaCAM*, which eventually converges to optima, provided that solution costs are accumulated transition costs. Second, we improve the successor generation to quickly obtain initial solutions. Exhaustive experiments demonstrate their utility. For instance, LaCAM* sub-optimally solved 99% of the instances retrieved from the MAPF benchmark, where the number of agents varied up to a thousand, within ten seconds on a standard desktop PC, while ensuring eventual convergence to optima; developing a new horizon of MAPF algorithms.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: Distributed and multi-agent planning
Robotics -> ROB: Motion and path planning
Planning and Scheduling -> PS: Planning algorithms
268
NerCo: A Contrastive Learning based Two-stage Chinese NER Method
Zai Zhang, Bin Shi, Haokun Zhang, Huang Xu, Yaodong Zhang, Yuefei Wu, Bo Dong, Qinghua Zheng
[+] More
[-] Less
Sequence labeling serves as the most commonly used scheme for Chinese named entity recognition(NER). However, traditional sequence labeling methods classify tokens within an entity into different classes according to their positions. As a result, different tokens in the same entity may be learned with representations that are isolated and unrelated in target representation space, which could finally negatively affect the subsequent performance of token classification. In this paper, we point out and define this problem as Entity Representation Segmentation in Label-semantics. And then we present NerCo: Named entity recognition with Contrastive learning, a novel NER framework which can better exploit labeled data and avoid the above problem. Following the pretrain-finetune paradigm, NerCo firstly guides the encoder to learn powerful label-semantics based representations by gathering the encoded token representations of the same Semantic Class while pushing apart that of different. Subsequently, NerCo finetunes the learned encoder for final entity prediction. Extensive experiments on several datasets demonstrate that our framework can consistently improve the baseline and achieve state-of-the-art performance. We release our codes at https://github.com/ijcainer/nerco.
List of keywords
Natural Language Processing -> NLP: Information extraction
Natural Language Processing -> NLP: Named entities
Natural Language Processing -> NLP: Tagging, chunking, and parsing
283
A Canonicalization-Enhanced Known Fact-Aware Framework For Open Knowledge Graph Link Prediction
Yilin Wang, Minghao Hu, Zhen Huang, Dongsheng Li, Xicheng Lu, Wei Luo, Dong Yang
[+] More
[-] Less
Open knowledge graph (OpenKG) link prediction aims to predict missing factual triples in the form of (head noun phrase, relation phrase, tail noun phrase), where existing triples are extracted from texts by open information extraction tools. Since triples are not canonicalized, previous methods either focus on canonicalizing noun phrases (NPs) to reduce graph sparsity, or utilize textual forms to improve type compatibility. However, they neglect to canonicalize relation phrases (RPs) and triples, making OpenKG maintain high sparsity and impeding the performance. To address the above issues, we propose a Canonicalization-Enhanced Known Fact-Aware (CEKFA) framework that boosts link prediction performance through sparsity reduction of RPs and triples. First, a similarity-driven RP canonicalization method is proposed to reduce RPs’ sparsity by sharing knowledge of semantically similar ones. Second, to reduce the sparsity of triples, a known fact-aware triple canonicalization method is designed to retrieve relevant known facts from training data. Finally, these two types of canonical information are integrated into a general two-stage re-ranking framework. Experiment results on two OpenKG datasets, ReVerb20K and ReVerb45K, show that our approach achieves state-of-the-art results. Extensive experimental analyses illustrate the effectiveness and generalization ability of the proposed framework.
List of keywords
Data Mining -> DM: Knowledge graphs and knowledge base completion
Data Mining -> DM: Information retrieval
Natural Language Processing -> NLP: Applications
292
Recognizable Information Bottleneck
Yilin Lyu, Xin Liu, Mingyang Song, Xinyue Wang, Yaxin Peng, Tieyong Zeng, Liping Jing
[+] More
[-] Less
Information Bottlenecks (IBs) learn representations that generalize to unseen data by information compression. However, existing IBs are practically unable to guarantee generalization in real-world scenarios due to the vacuous generalization bound. The recent PAC-Bayes IB uses information complexity instead of information compression to establish a connection with the mutual information generalization bound. However, it requires the computation of expensive second-order curvature, which hinders its practical application. In this paper, we establish the connection between the recognizability of representations and the recent functional conditional mutual information (f-CMI) generalization bound, which is significantly easier to estimate. On this basis we propose a Recognizable Information Bottleneck (RIB) which regularizes the recognizability of representations through a recognizability critic optimized by density ratio matching under the Bregman divergence. Extensive experiments on several commonly used datasets demonstrate the effectiveness of the proposed method in regularizing the model and estimating the generalization gap.
List of keywords
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Classification
298
3D Surface Super-resolution from Enhanced 2D Normal Images: A Multimodal-driven Variational AutoEncoder Approach
Wuyuan Xie, Tengcong Huang, Miaohui Wang
[+] More
[-] Less
3D surface super-resolution is an important technical tool in virtual reality, which is also a research hotspot in the field of computer vision. Due to the unstructured and irregular nature of 3D object data, it is usually difficult to obtain high-quality surface details and geometry textures via a low-cost hardware setup. In this paper, we establish a multimodal-driven variational autoencoder (mmVAE) framework to perform 3D surface enhancement based on 2D normal images. To fully leverage the multimodal learning, we investigate a multimodal Gaussian mixture model (mmGMM) to align and fuse the latent feature representations from different modalities, and further propose a cross-scale encoder-decoder structure to reconstruct high-resolution normal images. Experimental results on several benchmark datasets demonstrate that our method delivers promising surface geometry structures and details in comparison with competitive advances.
List of keywords
Computer Vision -> CV: Applications
Computer Vision -> CV: 3D computer vision
299
Model Conversion via Differentially Private Data-Free Distillation
Bochao Liu, Pengju Wang, Shikun Li, Dan Zeng, Shiming Ge
[+] More
[-] Less
While massive valuable deep models trained on large-scale data have been released to facilitate the artificial intelligence community, they may encounter attacks in deployment which leads to privacy leakage of training data. In this work, we propose a learning approach termed differentially private data-free distillation (DPDFD) for model conversion that can convert a pretrained model (teacher) into its privacy-preserving counterpart (student) via an intermediate generator without access to training data. The learning collaborates three parties in a unified way. First, massive synthetic data are generated with the generator. Then, they are fed into the teacher and student to compute differentially private gradients by normalizing the gradients and adding noise before performing descent. Finally, the student is updated with these differentially private gradients and the generator is updated by taking the student as a fixed discriminator in an alternate manner. In addition to a privacy-preserving student, the generator can generate synthetic data in a differentially private way for other down-stream tasks. We theoretically prove that our approach can guarantee differential privacy and well convergence. Extensive experiments that significantly outperform other differentially private generative approaches demonstrate the effectiveness of our approach.
List of keywords
Data Mining -> DM: Privacy-preserving data mining
Computer Vision -> CV: Bias, fairness and privacy
307
Self-supervised Neuron Segmentation with Multi-Agent Reinforcement Learning
Yinda Chen, Wei Huang, Shenglong Zhou, Qi Chen, Zhiwei Xiong
[+] More
[-] Less
The performance of existing supervised neuron segmentation methods is highly dependent on the amount of accurate annotations, especially when applied to large scale electron microscope (EM) data. By extracting semantic information from unlabeled data, self-supervised methods can improve the performance of downstream tasks, among which the mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images. However, due to the high degree of structural locality in EM images, as well as the existence of considerable noise, many voxels contain little discriminative information, making MIM pre-training inefficient on the neuron segmentation task. To overcome this challenge, we propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy. Due to the vast exploration space, using single-agent RL for voxel prediction is impractical. Therefore, we treat each input patch as an agent with a shared behavior policy, allowing for multi-agent collaboration. Furthermore, this multi-agent model is able to capture dependencies between voxels, which is beneficial for the downstream segmentation task. Experiments conducted on representative EM datasets demonstrate that our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
List of keywords
Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Segmentation
Machine Learning -> ML: Self-supervised Learning
326
TDG4Crowd:Test Data Generation for Evaluation of Aggregation Algorithms in Crowdsourcing
Yili Fang, Chaojie Shen, Huamao Gu, Tao Han, Xinyi Ding
[+] More
[-] Less
In crowdsourcing, existing efforts mainly use real datasets collected from crowdsourcing as test datasets to evaluate the effectiveness of aggregation algorithms. However, these work ignore the fact that the datasets obtained by crowdsourcing are usually sparse and imbalanced due to limited budget. As a result, applying the same aggregation algorithm on different datasets often show contradicting conclusions. For example, on the RTE dataset, Dawid and Skene model performs significantly better than Majority Voting, while on the LableMe dataset, the experiments give the opposite conclusion. It is challenging to obtain comprehensive and balanced datasets at a low cost. To our best knowledge, little effort have been made to the fair evaluation of aggregation algorithms. To fill in this gap, we propose a novel method named TDG4Crowd that can automatically generate comprehensive and balanced datasets. Using Kullback Leibler divergence and Kolmogorov–Smirnov test, the experiment results show the superior of our method compared with others. Aggregation algorithms also perform more consistently on the synthetic datasets generated using our method.
List of keywords
Humans and AI -> HAI: Human computation and crowdsourcing
Machine Learning -> ML: Autoencoders
Machine Learning -> ML: Cost-sensitive learning
328
Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks
Carlos Martin, Tuomas Sandholm
[+] More
[-] Less
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. Such game access is common in reinforcement learning settings, where the environment is typically treated as a black box. To tackle this problem, we apply zeroth-order optimization techniques that combine smoothed gradient estimators with equilibrium-finding dynamics. We model players’ strategies using artificial neural networks. In particular, we use randomized policy networks to model mixed strategies. These take noise in addition to an observation as input and can flexibly represent arbitrary observation-dependent, continuous-action distributions. Being able to model such mixed strategies is crucial for tackling continuous-action games that lack pure-strategy equilibria. We evaluate the performance of our method using an approximation of the Nash convergence metric from game theory, which measures how much players can benefit from unilaterally changing their strategy. We apply our method to continuous Colonel Blotto games, single-item and multi-item auctions, and a visibility game. The experiments show that our method can quickly find a high-quality approximate equilibrium. Furthermore, they show that the dimensionality of the input noise is crucial for performance. To our knowledge, this paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
339
SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution Situations
Chan Kim, Jaekyung Cho, Christophe Bobda, Seung-Woo Seo, Seong-Woo Kim
[+] More
[-] Less
Robotic agents trained using reinforcement learning have the problem of taking unreliable actions in an out-of-distribution (OOD) state. Agents can easily become OOD in real-world environments because it is almost impossible for them to visit and learn the entire state space during training. Unfortunately, unreliable actions do not ensure that agents perform their original tasks successfully. Therefore, agents should be able to recognize whether they are in OOD states and learn how to return to the learned state distribution rather than continue to take unreliable actions. In this study, we propose a novel method for retraining agents to recover from OOD situations in a self-supervised manner when they fall into OOD states. Our in-depth experimental results demonstrate that our method substantially improves the agent’s ability to recover from OOD situations in terms of sample efficiency and restoration of the performance for the original tasks. Moreover, we show that our method can retrain the agent to recover from OOD situations even when in-distribution states are difficult to visit through exploration. Code and supplementary materials are available at https://github.com/SNUChanKim/SeRO.
List of keywords
Machine Learning -> ML: Deep reinforcement learning
Machine Learning -> ML: Self-supervised Learning
Robotics -> ROB: Learning in robotics
345
OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking
Jiahao Nie, Zhiwei He, Yuxiang Yang, Zhengyi Bao, Mingyu Gao, Jing Zhang
[+] More
[-] Less
Two-stage point-to-box network acts as a critical role in the recent popular 3D Siamese tracking paradigm, which first generates proposals and then predicts corresponding proposal-wise scores. However, such a network suffers from tedious hyper-parameter tuning and task misalignment, limiting the tracking performance. Towards these concerns, we propose a simple yet effective one-stage point-to-box network for point cloud-based 3D single object tracking. It synchronizes 3D proposal generation and center-ness score prediction by a parallel predictor without tedious hyper-parameters. To guide a task-aligned score ranking of proposals, a center-aware focal loss is proposed to supervise the training of the center-ness branch, which enhances the network’s discriminative ability to distinguish proposals of different quality. Besides, we design a binary target classifier to identify target-relevant points. By integrating the derived classification scores with the center-ness scores, the resulting network can effectively suppress interference proposals and further mitigate task misalignment. Finally, we present a novel one-stage Siamese tracker OSP2B equipped with the designed network. Extensive experiments on challenging benchmarks including KITTI and Waymo SOT Dataset show that our OSP2B achieves leading performance with a considerable real-time speed.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Motion and tracking
Robotics -> ROB: Robotics and vision
350
Calibrating a Deep Neural Network with Its Predecessors
Linwei Tao, Minjing Dong, Daochang Liu, Changming Sun, Chang Xu
[+] More
[-] Less
Confidence calibration – the process to calibrate the output probability distribution of neural networks – is essential for safety-critical applications of such networks. Recent works verify the link between mis-calibration and overfitting. However, early stopping, as a well-known technique to mitigate overfitting, fails to calibrate networks. In this work, we study the limitions of early stopping and comprehensively analyze the overfitting problem of a network considering each individual block. We then propose a novel regularization method, predecessor combination search (PCS), to improve calibration by searching a combination of best-fitting block predecessors, where block predecessors are the corresponding network blocks with weight parameters from earlier training stages. PCS achieves the state-of-the-art calibration performance on multiple datasets and architectures. In addition, PCS improves model robustness under dataset distribution shift. Supplementary material and code are available at https://github.com/Linwei94/PCS
List of keywords
Machine Learning -> ML: Classification
375
Spike Count Maximization for Neuromorphic Vision Recognition
Jianxiong Tang, Jian-Huang Lai, Xiaohua Xie, Lingxiao Yang
[+] More
[-] Less
Spiking Neural Networks (SNNs) are the promising models of neuromorphic vision recognition. The mean square error (MSE) and cross-entropy (CE) losses are widely applied to supervise the training of SNNs on neuromorphic datasets. However, the relevance between the output spike counts and predictions is not well modeled by the existing loss functions. This paper proposes a Spike Count Maximization (SCM) training approach for the SNN-based neuromorphic vision recognition model based on optimizing the output spike counts. The SCM is achieved by structural risk minimization (SRM) and a specially designed spike counting loss. The spike counting loss counts the output spikes of the SNN by using the L0-norm, and the SRM maximizes the distance between the margin boundaries of the classifier to ensure the generalization of the model. The SCM is non-smooth and non-differentiable, and we design an iterative algorithm with fast convergence to solve the problem. Experiment results demonstrate that the SCM performs satisfactorily in most cases. Using the output spikes for prediction, the accuracies of SCM are 2.12%~16.50% higher than the popular training losses on the CIFAR10-DVS dataset. The code is available at https://github.com/TJXTT/SCM-SNN.
List of keywords
Machine Learning -> ML: Classification
Computer Vision -> CV: Machine learning for vision
379
Eliminating the Computation of Strongly Connected Components in Generalized Arc Consistency Algorithm for AllDifferent Constraint
Luhan Zhen, Zhanshan Li, Yanzhi Li, Hongbo Li
[+] More
[-] Less
AllDifferent constraint is widely used in Constraint Programming to model real world problems. Existing Generalized Arc Consistency (GAC) algorithms map an AllDifferent constraint onto a bipartite graph and utilize the structure of Strongly Connected Components (SCCs) in the graph to filter values. Calculating SCCs is time-consuming in the existing algorithms, so we propose a novel GAC algorithm for AllDifferent constraint in this paper, which eliminates the computation of SCCs. We prove that all redundant edges in the bipartite graph point to some alternating cycles. Our algorithm exploits this property and uses a more efficient method to filter values, which is based on breadth-first search. Experimental results on the XCSP3 benchmark suite show that our algorithm considerably outperforms the state-of-the-art GAC algorithms.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Constraint Satisfaction and Optimization -> CSO: Constraint programming
382
Learning to Learn from Corrupted Data for Few-Shot Learning
Yuexuan An, Xingyu Zhao, Hui Xue
[+] More
[-] Less
Few-shot learning which aims to generalize knowledge learned from annotated base training data to recognize unseen novel classes has attracted considerable attention. Existing few-shot methods rely on completely clean training data. However, in the real world, the training data are always corrupted and accompanied by noise due to the disturbance in data transmission and low-quality annotation, which severely degrades the performance and generalization capability of few-shot models. To address the problem, we propose a unified peer-collaboration learning (PCL) framework to extract valid knowledge from corrupted data for few-shot learning. PCL leverage two modules to mimic the peer collaboration process which cooperatively evaluates the importance of each sample. Specifically, each module first estimates the importance weights of different samples by encoding the information provided by the other module from both global and local perspectives. Then, both modules leverage the obtained importance weights to guide the reevaluation of the loss value of each sample. In this way, the peers can mutually absorb knowledge to improve the robustness of few-shot models. Experiments verify that our framework combined with different few-shot methods can significantly improve the performance and robustness of original models.
List of keywords
Machine Learning -> ML: Few-shot learning
389
Why Rumors Spread Fast in Social Networks, and How to Stop It
Ahad N. Zehmakan, Charlotte Out, Sajjad Hesamipour Khelejan
[+] More
[-] Less
We study a rumor spreading model where individuals are connected via a network structure. Initially, only a small subset of the individuals are spreading a rumor. Each individual who is connected to a spreader, starts spreading the rumor with some probability as a function of their trust in the spreader, quantified by the Jaccard similarity index. Furthermore, the probability that a spreader diffuses the rumor decreases over time until they fully lose their interest and stop spreading. We focus on determining the graph parameters which govern the magnitude and pace that the rumor spreads in this model. We prove that for the rumor to spread to a sizable fraction of the individuals, the network needs to enjoy “strong” expansion properties and most nodes should be in “well-connected” communities. Both of these characteristics are, arguably, present in real-world social networks up to a certain degree, shedding light on the driving force behind the extremely fast spread of rumors in social networks. Furthermore, we formulate a large range of countermeasures to cease the spread of a rumor. We introduce four fundamental criteria which a countermeasure ideally should possess. We evaluate all the proposed countermeasures by conducting experiments on real-world social networks such as Facebook and Twitter. We conclude that our novel decentralized countermeasures (which are executed by the individuals) generally outperform the previously studied centralized ones (which need to be imposed by a third entity such as the government).
List of keywords
Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Game Theory and Economic Paradigms -> GTEP: Computational social choice
Multidisciplinary Topics and Applications -> MDA: Web and social networks
396
A Large-Scale Film Style Dataset for Learning Multi-frequency Driven Film Enhancement
Zinuo Li, Xuhang Chen, Shuqiang Wang, Chi-Man Pun
[+] More
[-] Less
Film, a classic image style, is culturally significant to the whole photographic industry since it marks the birth of photography. However, film photography is time-consuming and expensive, necessitating a more efficient method for collecting film-style photographs. Numerous datasets that have emerged in the field of image enhancement so far are not film-specific. In order to facilitate film-based image stylization research, we construct FilmSet, a large-scale and high-quality film style dataset. Our dataset includes three different film types and more than 5000 in-the-wild high resolution images. Inspired by the features of FilmSet images, we propose a novel framework called FilmNet based on Laplacian Pyramid for stylizing images across frequency bands and achieving film style outcomes. Experiments reveal that the performance of our model is superior than state-of-the-art techniques. The link of our dataset and code is https://github.com/CXH-Research/FilmNet.
List of keywords
Computer Vision -> CV: Computational photography
Computer Vision -> CV: Machine learning for vision
398
SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation
Xuewei Li, Tao Wu, Gaoang Wang, Zhongang Qi, Ying Shan, Xi Li
[+] More
[-] Less
As an important and challenging problem in computer vision, PAnoramic Semantic Segmentation (PASS) gives complete scene perception based on an ultra-wide angle of view. Usually, prevalent PASS methods with 2D panoramic image input focus on solving image distortions but lack consideration of the 3D properties of original 360 degree data. Their performance will drop a lot when inputting panoramic images with the 3D disturbance. To be more robust to 3D disturbance, we propose our Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation (SGAT4PASS), considering 3D spherical geometry knowledge. Specifically, a spherical geometry-aware framework is proposed for PASS. It includes three modules, i.e., spherical geometry-aware image projection, spherical deformable patch embedding, and a panorama-aware loss, which takes input images with 3D disturbance into account, adds a spherical geometry-aware constraint on the existing deformable patch embedding, and indicates the pixel density of original 360 degree data, respectively.Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of our performance is improved by an order of magnitude.
List of keywords
Computer Vision -> CV: Scene analysis and understanding   
Computer Vision -> CV: Recognition (object detection, categorization)
399
Asynchronous Communication Aware Multi-Agent Task Allocation
Ben Rachmut, Sofia Amador Nelke, Roie Zivan
[+] More
[-] Less
Multi-agent task allocation in physical environments with spatial and temporal constraints, are hard problems that are relevant in many realistic applications. A task allocation algorithm based on Fisher market clearing (FMC_TA), that can be performed either centrally or distributively, has been shown to produce high quality allocations in comparison to both centralized and distributed state of the art incomplete optimization algorithms. However, the algorithm is synchronous and therefore depends on perfect communication between agents. We propose FMC_ATA, an asynchronous version of FMC_TA, which is robust to message latency and message loss. In contrast to the former version of the algorithm, FMC_ATA allows agents to identify dynamic events and initiate the generation of an updated allocation. Thus, it is more compatible for dynamic environments. We further investigate the conditions in which the distributed version of the algorithm is preferred over the centralized version. Our results indicate that the proposed asynchronous distributed algorithm produces consistent results even when the communication level is extremely poor.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Agent communication
Constraint Satisfaction and Optimization -> CSO: Distributed constraints
400
Measuring Acoustics with Collaborative Multiple Agents
Yinfeng Yu, Changan Chen, Lele Cao, Fangkai Yang, Fuchun Sun
[+] More
[-] Less
As humans, we hear sound every second of our life. The sound we hear is often affected by the acoustics of the environment surrounding us. For example, a spacious hall leads to more reverberation. Room Impulse Responses (RIR) are commonly used to characterize environment acoustics as a function of the scene geometry, materials, and source/receiver locations. Traditionally, RIRs are measured by setting up a loudspeaker and microphone in the environment for all source/receiver locations, which is time-consuming and inefficient. We propose to let two robots measure the environment’s acoustics by actively moving and emitting/receiving sweep signals. We also devise a collaborative multi-agent policy where these two robots are trained to explore the environment’s acoustics while being rewarded for wide exploration and accurate prediction. We show that the robots learn to collaborate and move to explore environment acoustics while minimizing the prediction error. To the best of our knowledge, we present the very first problem formulation and solution to the task of collaborative environment acoustics measurements with multiple agents.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Game Theory and Economic Paradigms -> GTEP: Cooperative games
408
Learning Object Consistency and Interaction in Image Generation from Scene Graphs
Yangkang Zhang, Chenye Meng, Zejian Li, Pei Chen, Guang Yang, Changyuan Yang, Lingyun Sun
[+] More
[-] Less
This paper is concerned with synthesizing images conditioned on a scene graph (SG), a set of object nodes and their edges of interactive relations. We divide existing works into image-oriented and code-oriented methods. In our analysis, the image-oriented methods do not consider object interaction in spatial hidden feature. On the other hand, in empirical study, the code-oriented methods lose object consistency as their generated images miss certain objects in the input scene graph. To alleviate these two issues, we propose Learning Object Consistency and Interaction (LOCI). To preserve object consistency, we design a consistency module with a weighted augmentation strategy for objects easy to be ignored and a matching loss between scene graphs and image codes. To learn object interaction, we design an interaction module consisting of three kinds of message propagation between the input scene graph and the learned image code. Experiments on COCO-stuff and Visual Genome datasets show our proposed method alleviates the ignorance of objects and outperforms the state-of-the-art on visual fidelity of generated images and objects.
List of keywords
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
Computer Vision -> CV: Scene analysis and understanding   
429
VGOS: Voxel Grid Optimization for View Synthesis from Sparse Inputs
Jiakai Sun, Zhanjie Zhang, Jiafu Chen, Guangyuan Li, Boyan Ji, Lei Zhao, Wei Xing
[+] More
[-] Less
Neural Radiance Fields (NeRF) has shown great success in novel view synthesis due to its state-of-the-art quality and flexibility. However, NeRF requires dense input views (tens to hundreds) and a long training time (hours to days) for a single scene to generate high-fidelity images. Although using the voxel grids to represent the radiance field can significantly accelerate the optimization process, we observe that for sparse inputs, the voxel grids are more prone to overfitting to the training views and will have holes and floaters, which leads to artifacts. In this paper, we propose VGOS, an approach for fast (3-5 minutes) radiance field reconstruction from sparse inputs (3-10 views) to address these issues. To improve the performance of voxel-based radiance field in sparse input scenarios, we propose two methods: (a) We introduce an incremental voxel training strategy, which prevents overfitting by suppressing the optimization of peripheral voxels in the early stage of reconstruction. (b) We use several regularization techniques to smooth the voxels, which avoids degenerate solutions. Experiments demonstrate that VGOS achieves state-of-the-art performance for sparse inputs with super-fast convergence. Code will be available at https://github.com/SJoJoK/VGOS.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
Computer Vision -> CV: Applications
434
Unreliable Partial Label Learning with Recursive Separation
Yu Shi, Ning Xu, Hua Yuan, Xin Geng
[+] More
[-] Less
Partial label learning (PLL) is a typical weakly supervised learning problem in which each instance is associated with a candidate label set, and among which only one is true. However, the assumption that the ground-truth label is always among the candidate label set would be unrealistic, as the reliability of the candidate label sets in real-world applications cannot be guaranteed by annotators. Therefore, a generalized PLL named Unreliable Partial Label Learning (UPLL) is proposed, in which the true label may not be in the candidate label set. Due to the challenges posed by unreliable labeling, previous PLL methods will experience a marked decline in performance when applied to UPLL. To address the issue, we propose a two-stage framework named Unreliable Partial Label Learning with Recursive Separation (UPLLRS). In the first stage, the self-adaptive recursive separation strategy is proposed to separate the training set into a reliable subset and an unreliable subset. In the second stage, a disambiguation strategy is employed to progressively identify the ground-truth labels in the reliable subset. Simultaneously, semi-supervised learning methods are adopted to extract valuable information from the unreliable subset. Our method demonstrates state-of-the-art performance as evidenced by experimental results, particularly in situations of high unreliability. Code and supplementary materials are available at https://github.com/dhiyu/UPLLRS.
List of keywords
Machine Learning -> ML: Weakly supervised learning
435
ViT-CX: Causal Explanation of Vision Transformers
Weiyan Xie, Xiao-Hui Li, Caleb Chen Cao, Nevin Zhang
[+] More
[-] Less
Despite the popularity of Vision Transformers (ViTs) and eXplainable AI (XAI), only a few explanation methods have been designed specially for ViTs thus far. They mostly use attention weights of the [CLS] token on patch embeddings and often produce unsatisfactory saliency maps. This paper proposes a novel method for explaining ViTs called ViT-CX. It is based on patch embeddings, rather than attentions paid to them, and their causal impacts on the model output. Other characteristics of ViTs such as causal overdetermination are considered in the design of ViT-CX. The empirical results show that ViT-CX produces more meaningful saliency maps and does a better job revealing all important evidence for the predictions than previous methods. The explanation generated by ViT-CX also shows significantly better faithfulness to the model.
List of keywords
Computer Vision -> CV: Interpretability and transparency
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Explainable/Interpretable machine learning
449
CTW: Confident Time-Warping for Time-Series Label-Noise Learning
Peitian Ma, Zhen Liu, Junhao Zheng, Linghao Wang, Qianli Ma
[+] More
[-] Less
Noisy labels seriously degrade the generalization ability of Deep Neural Networks (DNNs) in various classification tasks. Existing studies on label-noise learning mainly focus on computer vision. However, time series also suffer from the same issue. Directly applying the methods from computer vision to time series may reduce the temporal dependency due to different data characteristics. How to make use of the properties of time series to enable DNNs to learn robust representations in the presence of noisy labels has not been fully explored. To this end, this paper proposes a method that expands the distribution of Confident instances by Time-Warping (CTW) to learn robust representations of time series. Specifically, since applying the augmentation method to all data may introduce extra mislabeled data, we select confident instances to implement Time-Warping. In addition, we normalize the distribution of the training loss of each class to eliminate the model’s selection preference for instances of different classes, alleviating the class imbalance caused by sample selection. Extensive experimental results show that CTW achieves state-of-the-art performance on the UCR datasets when dealing with different types of noise. Besides, the t-SNE visualization of our method verifies that augmenting confident data improves the generalization ability.
List of keywords
Machine Learning -> ML: Classification
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Time series and data streams
451
Invertible Residual Neural Networks with Conditional Injector and Interpolator for Point Cloud Upsampling
Yaqi Duan, Aihua Mao, Yu-Hui Wen, Zihui Du, Hongmin Cai, Yong-Jin Liu
[+] More
[-] Less
Point clouds obtained by LiDAR and other sensors are usually sparse and irregular. Low-quality point clouds have serious influence on the final performance of downstream tasks. Recently, a point cloud upsampling network with normalizing flows has been proposed to address this problem. However, the network heavily relies on designing specialized architectures to achieve invertibility. In this paper, we propose a novel invertible residual neural network for point cloud upsampling, called PU-INN, which allows unconstrained architectures to learn more expressive feature transformations. Then, we propose a conditional injector to improve nonlinear transformation ability of the neural network while guaranteeing invertibility. Furthermore, a lightweight interpolator is proposed based on semantic similarity distance in the latent space, which can intuitively reflect the interpolation changes in Euclidean space. Qualitative and quantitative results show that our method outperforms the state-of-the-art works in terms of distribution uniformity, proximity-to-surface accuracy, 3D reconstruction quality, and computation efficiency.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
Machine Learning -> ML: Probabilistic machine learning
460
ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds
Xiangze Jia, Hui Zhou, Xinge Zhu, Yandong Guo, Ji Zhang, Yuexin Ma
[+] More
[-] Less
In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation. Different from usually adopted self-supervised strategies for data-level structure consistency, we predict scene motion via feature-level consistency between pillars in consecutive frames, which can eliminate the effect caused by noise points and view-changing point clouds in dynamic scenes. Specifically, we propose Soft Discriminative Loss that provides the network with more pseudo-supervised signals to learn discriminative and robust features in a contrastive learning manner. We also propose Gated Multi-Frame Fusion block that learns valid compensation between point cloud frames automatically to enhance feature extraction. Finally, pillar association is proposed to predict pillar correspondence probabilities based on feature distance, and whereby further predicts scene motion. Extensive experiments show the effectiveness and superiority of our ContrastMotion on both scene flow and motion prediction tasks.
List of keywords
Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Scene analysis and understanding   
479
Causal Deep Reinforcement Learning using Observational Data
Wenxuan Zhu, Chao Yu, Qiang Zhang
[+] More
[-] Less
Deep reinforcement learning (DRL) requires the collection of plenty of interventional data, which is sometimes expensive and even unethical in the real world, such as in the autonomous driving and the medical field. Offline reinforcement learning promises to alleviate this issue by exploiting the vast amount of observational data available in the real world. However, observational data may mislead the learning agent to undesirable outcomes if the behavior policy that generates the data depends on unobserved random variables (i.e., confounders). In this paper, we propose two deconfounding methods in DRL to address this problem. The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function by reweighting or resampling the offline dataset to ensure its unbiasedness. These deconfounding methods can be flexibly combined with the existing model-free DRL algorithms such as soft actor-critic and deep Q-learning, provided that a weak condition can be satisfied by the loss functions of these algorithms. We prove the effectiveness of our deconfounding methods and validate them experimentally.
List of keywords
Machine Learning -> ML: Deep reinforcement learning
Machine Learning -> ML: Causality
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
485
KDLGT: A Linear Graph Transformer Framework via Kernel Decomposition Approach
Yi Wu, Yanyang Xu, Wenhao Zhu, Guojie Song, Zhouchen Lin, Liang Wang, Shaoguo Liu
[+] More
[-] Less
In recent years, graph Transformers (GTs) have been demonstrated as a robust architecture for a wide range of graph learning tasks. However, the quadratic complexity of GTs limits their scalability on large-scale data, in comparison to Graph Neural Networks (GNNs). In this work, we propose the Kernel Decomposition Linear Graph Transformer (KDLGT), an accelerating framework for building scalable and powerful GTs. KDLGT employs the kernel decomposition approach to rearrange the order of matrix multiplication, thereby reducing complexity to linear. Additionally, it categorizes GTs into three distinct types and provides tailored accelerating methods for each category to encompass all types of GTs. Furthermore, we provide a theoretical analysis of the performance gap between KDLGT and self-attention to ensure its effectiveness. Under this framework, we select two representative GTs to design our models. Experiments on both real-world and synthetic datasets indicate that KDLGT not only achieves state-of-the-art performance on various datasets but also reaches an acceleration ratio of approximately 10 on graphs of certain sizes.
List of keywords
Data Mining -> DM: Big data and scalability
Data Mining -> DM: Mining graphs
490
Physics-Guided Human Motion Capture with Pose Probability Modeling
Jingyi Ju, Buzhen Huang, Chen Zhu, Zhihao Li, Yangang Wang
[+] More
[-] Less
Incorporating physics in human motion capture to avoid artifacts like floating, foot sliding, and ground penetration is a promising direction. Existing solutions always adopt kinematic results as reference motions, and the physics is treated as a post-processing module. However, due to the depth ambiguity, monocular motion capture inevitably suffers from noises, and the noisy reference often leads to failure for physics-based tracking. To address the obstacles, our key-idea is to employ physics as denoising guidance in the reverse diffusion process to reconstruct physically plausible human motion from a modeled pose probability distribution. Specifically, we first train a latent gaussian model that encodes the uncertainty of 2D-to-3D lifting to facilitate reverse diffusion. Then, a physics module is constructed to track the motion sampled from the distribution. The discrepancies between the tracked motion and image observation are used to provide explicit guidance for the reverse diffusion model to refine the motion. With several iterations, the physics-based tracking and kinematic denoising promote each other to generate a physically plausible human motion. Experimental results show that our method outperforms previous physics-based methods in both joint accuracy and success rate.
List of keywords
Computer Vision -> CV: Biometrics, face, gesture and pose recognition
Computer Vision -> CV: 3D computer vision
526
Constraints First: A New MDD-based Model to Generate Sentences Under Constraints
Alexandre Bonlarron, Aurelie Calabrese, Pierre Kornprobst, Jean-Charles Régin
[+] More
[-] Less
This paper introduces a new approach to generating strongly constrained texts. We consider standardized sentence generation for the typical application of vision screening. To solve this problem, we formalize it as a discrete combinatorial optimization problem and utilize multivalued decision diagrams (MDD), a well-known data structure to deal with constraints. In our context, one key strength of MDD is to compute an exhaustive set of solutions without performing any search. Once the sentences are obtained, we apply a language model (GPT-2) to keep the best ones. We detail this for English and also for French where the agreement and conjugation rules are known to be more complex. Finally, with the help of GPT-2, we get hundreds of bona-fide candidate sentences. When compared with the few dozen sentences usually available in the well-known vision screening test (MNREAD), this brings a major breakthrough in the field of standardized sentence generation. Also, as it can be easily adapted for other languages, it has the potential to make the MNREAD test even more valuable and usable. More generally, this paper highlights MDD as a convincing alternative for constrained text generation, especially when the constraints are hard to satisfy, but also for many other prospects.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Constraint programming
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Modeling
534
On the Fairness Impacts of Private Ensembles Models
Cuong Tran, Ferdinando Fioretto
[+] More
[-] Less
The Private Aggregation of Teacher Ensembles (PATE) is a machine learning framework that enables the creation of private models through the combination of multiple "teacher" models and a "student" model. The student model learns to pre6 dict an output based on the voting of the teachers, and the resulting model satisfies differential privacy. PATE has been shown to be effective in creating private models in semi-supervised settings or when protecting data labels is a priority. This paper explores whether the use of PATE can result in unfairness, and demonstrates that it can lead to accuracy disparities among groups of individuals. The paper also analyzes the algorithmic and data properties that contribute to these disproportionate impacts, why these aspects are affecting different groups disproportionately, and offers recommendations for mitigating these effects.
List of keywords
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Computer Vision -> CV: Bias, fairness and privacy
Multidisciplinary Topics and Applications -> MDA: Security and privacy
536
BPNet: Bézier Primitive Segmentation on 3D Point Clouds
Rao Fu, Cheng Wen, QIAN LI, Xiao Xiao, Pierre Alliez
[+] More
[-] Less
This paper proposes BPNet, a novel end-to-end deep learning framework to learn Bézier primitive segmentation on 3D point clouds. The existing works treat different primitive types separately, thus limiting them to finite shape categories. To address this issue, we seek a generalized primitive segmentation on point clouds. Taking inspiration from Bézier decomposition on NURBS models, we transfer it to guide point cloud segmentation casting off primitive types. A joint optimization framework is proposed to learn Bézier primitive segmentation and geometric fitting simultaneously on a cascaded architecture. Specifically, we introduce a soft voting regularizer to improve primitive segmentation and propose an auto-weight embedding module to cluster point features, making the network more robust and generic. We conducted extensive experiments on synthetic datasets (ABC Dataset) and real-scan datasets to validate and compare our approach with different baseline methods. Experiments show superior performance over previous work in terms of segmentation, with a substantially faster testing speed.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Segmentation
539
Rubik’s Optical Neural Networks: Multi-task Learning with Physics-aware Rotation Architecture
Yingjie Li, Weilu Gao, Cunxi Yu
[+] More
[-] Less
Recently, there are increasing efforts on advancing optical neural networks (ONNs), which bring significant advantages for machine learning (ML) in terms of power efficiency, parallelism, and computational speed. With the considerable benefits in computation speed and energy efficiency, there are significant interests in leveraging ONNs into medical sensing, security screening, drug detection, and autonomous driving. However, due to the challenge of implementing reconfigurability, deploying multi-task learning (MTL) algorithms on ONNs requires re-building and duplicating the physical diffractive systems, which significantly degrades the energy and cost efficiency in practical application scenarios. This work presents a novel ONNs architecture, namely, RubikONNs, which utilizes the physical properties of optical systems to encode multiple feed-forward functions by physically rotating the hardware similarly to rotating a Rubik’s Cube. To optimize MTL performance on RubikONNs, two domain-specific physics-aware training algorithms RotAgg and RotSeq are proposed. Our experimental results demonstrate more than 4x improvements in energy and cost efficiency with marginal accuracy degradation compared to the state-of-the-art approaches.
List of keywords
Multidisciplinary Topics and Applications -> MDA: AI hardware
Machine Learning -> ML: Classification
Multidisciplinary Topics and Applications -> MDA: Physical sciences
541
On Approximating Total Variation Distance
Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S Meel, Dimitrios Myrisiotis, A Pavan, Vinodchandran N. Variyam
[+] More
[-] Less
Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the problem of computing the TV distance of two product distributions over the domain {0,1}^n. In particular, we establish the following results. 1. The problem of exactly computing the TV distance of two product distributions is #P-complete. This is in stark contrast with other distance measures such as KL, Chi-square, and Hellinger which tensorize over the marginals leading to efficient algorithms. 2. There is a fully polynomial-time deterministic approximation scheme (FPTAS) for computing the TV distance of two product distributions P and Q where Q is the uniform distribution. This result is extended to the case where Q has a constant number of distinct marginals. In contrast, we show that when P and Q are Bayes net distributions the relative approximation of their TV distance is NP-hard.
List of keywords
Machine Learning -> ML: Other
547
Ensemble Reinforcement Learning in Continuous Spaces — A Hierarchical Multi-Step Approach for Policy Training
Gang Chen, Victoria Huang
[+] More
[-] Less
Actor-critic deep reinforcement learning (DRL) algorithms have recently achieved prominent success in tackling various challenging reinforcement learning (RL) problems, particularly complex control tasks with high-dimensional continuous state and action spaces. Nevertheless, existing research showed that actor-critic DRL algorithms often failed to explore their learning environments effectively, resulting in limited learning stability and performance. To address this limitation, several ensemble DRL algorithms have been proposed lately to boost exploration and stabilize the learning process. However, most of existing ensemble algorithms do not explicitly train all base learners towards jointly optimizing the performance of the ensemble. In this paper, we propose a new technique to train an ensemble of base learners based on an innovative multi-step integration method. This training technique enables us to develop a new hierarchical learning algorithm for ensemble DRL that effectively promotes inter-learner collaboration through stable inter-learner parameter sharing. The design of our new algorithm is verified theoretically. The algorithm is also shown empirically to outperform several state-of-the-art DRL algorithms on multiple benchmark RL problems.
List of keywords
Machine Learning -> ML: Deep reinforcement learning
Machine Learning -> ML: Ensemble methods
555
Strategic Adversarial Attacks in AI-assisted Decision Making to Reduce Human Trust and Reliance
Zhuoran Lu, Zhuoyan Li, Chun-Wei Chiang, Ming Yin
[+] More
[-] Less
With the increased integration of AI technologies in human decision making processes, adversarial attacks on AI models become a greater concern than ever before as they may significantly hurt humans’ trust in AI models and decrease the effectiveness of human-AI collaboration. While many adversarial attack methods have been proposed to decrease the performance of an AI model, limited attention has been paid on understanding how these attacks will impact the human decision makers interacting with the model, and accordingly, how to strategically deploy adversarial attacks to maximize the reduction of human trust and reliance. In this paper, through a human-subject experiment, we first show that in AI-assisted decision making, the timing of the attacks largely influences how much humans decrease their trust in and reliance on AI—the decrease is particularly salient when attacks occur on decision making tasks that humans are highly confident themselves. Based on these insights, we next propose an algorithmic framework to infer the human decision maker’s hidden trust in the AI model and dynamically decide when the attacker should launch an attack to the model. Our evaluations show that following the proposed approach, attackers deploy more efficient attacks and achieve higher utility than adopting other baseline strategies.
List of keywords
Humans and AI -> HAI: Human-AI collaboration
Humans and AI -> HAI: Applications
Humans and AI -> HAI: Human-computer interaction
580
Robust Steganography without Embedding based on Secure Container Synthesis and Iterative Message Recovery
Ziping Ma, Yuesheng Zhu, Guibo Luo, Xiyao Liu, Gerald Schaefer, Hui Fang
[+] More
[-] Less
Synthesis-based steganography without embedding (SWE) methods transform secret messages to container images synthesised by generative networks, which avoids the distortions of container images and thus can fundamentally resist typical steganalysis tools. However, existing methods suffer from weak message recovery robustness, synthesis fidelity, and the risk of message leakage. To solve these problems, we propose a novel robust steganography without embedding method in this paper. In our method, we design a secure weight-modulation-based generator by introducing secure factors to hide secret messages in synthesised container images. In this manner, the synthesised results are modulated by secure factors and thus the secret messages are inaccessible when using fake factors, which reduces the risk of message leakage. Furthermore, we design a difference predictor via the reconstruction of tampered container images together with an adversarial training strategy to iteratively update the estimation of hidden messages. In this manner, the robustness of recovering hidden messages is ensured, and the degradation of synthesis fidelity is reduced since the generator is not included in the adversarial training. Extensive experimental results have demonstrated that our designed method is effective to avoid message leakage and superior to other existing methods in terms of recovery robustness and synthesis fidelity.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Security and privacy
Computer Vision -> CV: Applications
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
586
PasCore: A Chinese Overlapping Relation Extraction Model Based on Global Pointer Annotation Strategy
Peng Wang, Jiafeng Xie, Xiye Chen, Guozheng Li, Wei Li
[+] More
[-] Less
Recent work for extracting relations from texts has achieved excellent performance. However, existing studies mainly focus on simple relation extraction, these methods perform not well on overlapping triple problem because the tags of shared entities would conflict with each other. Especially, overlapping entities are common and indispensable in Chinese. To address this issue, this paper proposes PasCore, which utilizes a global pointer annotation strategy for overlapping relation extraction in Chinese. PasCore first obtains the sentence vector via general pre-training model encoder, and uses classifier to predicate relations. Subsequently, it uses global pointer annotation strategy for head entity annotation, which uses global tags to label the start and end positions of the entities. Finally, PasCore integrates the relation, head entity and its type to mark the tail entity. Furthermore, PasCore performs conditional layer normalization to fuse features, which connects all stages and greatly enriches the association between relations and entities. Experimental results on both Chinese and English real-world datasets demonstrate that PasCore outperforms strong baselines on relation extraction and, especially, shows superior performance on overlapping relation extraction.
List of keywords
Natural Language Processing -> NLP: Information extraction
Data Mining -> DM: Knowledge graphs and knowledge base completion
596
Deep Unfolding Convolutional Dictionary Model for Multi-Contrast MRI Super-resolution and Reconstruction
Pengcheng Lei, Faming Fang, Guixu Zhang, Ming Xu
[+] More
[-] Less
Magnetic resonance imaging (MRI) tasks often involve multiple contrasts. Recently, numerous deep learning-based multi-contrast MRI super-resolution (SR) and reconstruction methods have been proposed to explore the complementary information from the multi-contrast images. However, these methods either construct parameter-sharing networks or manually design fusion rules, failing to accurately model the correlations between multi-contrast images and lacking certain interpretations. In this paper, we propose a multi-contrast convolutional dictionary (MC-CDic) model under the guidance of the optimization algorithm with a well-designed data fidelity term. Specifically, we bulid an observation model for the multi-contrast MR images to explicitly model the multi-contrast images as common features and unique features. In this way, only the useful information in the reference image can be transferred to the target image, while the inconsistent information will be ignored. We employ the proximal gradient algorithm to optimize the model and unroll the iterative steps into a deep CDic model. Especially, the proximal operators are replaced by learnable ResNet. In addition, multi-scale dictionaries are introduced to further improve the model performance. We test our MC-CDic model on multi-contrast MRI SR and reconstruction tasks. Experimental results demonstrate the superior performance of the proposed MC-CDic model against existing SOTA methods. Code is available at https://github.com/lpcccc-cv/MC-CDic.
List of keywords
Computer Vision -> CV: Biomedical image analysis
604
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Peng Jin, Hao Li, Zesen Cheng, Jinfa Huang, Zhennan Wang, Li Yuan, Chang Liu, Jie Chen
[+] More
[-] Less
Text-video retrieval is a challenging cross-modal task, which aims to align visual entities with natural language descriptions. Current methods either fail to leverage the local details or are computationally expensive. What’s worse, they fail to leverage the heterogeneous concepts in data. In this paper, we propose the Disentangled Conceptualization and Set-to-set Alignment (DiCoSA) to simulate the conceptualizing and reasoning process of human beings. For disentangled conceptualization, we divide the coarse feature into multiple latent factors related to semantic concepts. For set-to-set alignment, where a set of visual concepts correspond to a set of textual concepts, we propose an adaptive pooling method to aggregate semantic concepts to address the partial matching. In particular, since we encode concepts independently in only a few dimensions, DiCoSA is superior at efficiency and granularity, ensuring fine-grained interactions using a similar computational complexity as coarse-grained alignment. Extensive experiments on five datasets, including MSR-VTT, LSMDC, MSVD, ActivityNet, and DiDeMo, demonstrate that our method outperforms the existing state-of-the-art methods.
List of keywords
Computer Vision -> CV: Image and video retrieval 
Computer Vision -> CV: Video analysis and understanding   
Computer Vision -> CV: Vision and language 
607
Contact2Grasp: 3D Grasp Synthesis via Hand-Object Contact Constraint
Haoming Li, Xinzhuo Lin, Yang Zhou, Xiang Li, Yuchi Huo, Jiming Chen, Qi Ye
[+] More
[-] Less
3D grasp synthesis generates grasping poses given an input object. Existing works tackle the problem by learning a direct mapping from objects to the distributions of grasping poses. However, because the physical contact is sensitive to small changes in pose, the high-nonlinear mapping between 3D object representation to valid poses is considerably non-smooth, leading to poor generation efficiency and restricted generality. To tackle the challenge, we introduce an intermediate variable for grasp contact areas to constrain the grasp generation; in other words, we factorize the mapping into two sequential stages by assuming that grasping poses are fully constrained given contact maps: 1) we first learn contact map distributions to generate the potential contact maps for grasps; 2) then learn a mapping from the contact maps to the grasping poses. Further, we propose a penetration-aware optimization with the generated contacts as a consistency constraint for grasp refinement. Extensive validations on two public datasets show that our method outperforms state-of-the-art methods regarding grasp generation on various metrics.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications
619
Controlling Neural Style Transfer with Deep Reinforcement Learning
Chengming Feng, Jing Hu, Xin Wang, Shu Hu, Bin Zhu, Xi Wu, Hongtu Zhu, Siwei Lyu
[+] More
[-] Less
Controlling the degree of stylization in the Neural Style Transfer (NST) is a little tricky since it usually needs hand-engineering on hyper-parameters. In this paper, we propose the first deep Reinforcement Learning (RL) based architecture that splits one-step style transfer into a step-wise process for the NST task. Our RL-based method tends to preserve more details and structures of the content image in early steps, and synthesize more style patterns in later steps. It is a user-easily-controlled style-transfer method. Additionally, as our RL-based model performs the stylization progressively, it is lightweight and has lower computational complexity than existing one-step Deep Learning (DL) based models. Experimental results demonstrate the effectiveness and robustness of our method.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Applications
Computer Vision -> CV: Applications
621
Compositional Zero-Shot Artistic Font Synthesis
Xiang Li, Lei Wu, Changshuo Wang, Lei Meng, Xiangxu Meng
[+] More
[-] Less
Recently, many researchers have made remarkable achievements in the field of artistic font synthesis, with impressive glyph style and effect style in the results. However, due to less exploration in style disentanglement, it is difficult for existing methods to envision a kind of unseen style (glyph-effect) compositions of artistic font, and thus can only learn the seen style compositions. To solve this problem, we propose a novel compositional zero-shot artistic font synthesis gan (CAFS-GAN), which allows the synthesis of unseen style compositions by exploring the visual independence and joint compatibility of encoding semantics between glyph and effect. Specifically, we propose two contrast-based style encoders to achieve style disentanglement due to glyph and effect intertwining in the image. Meanwhile, to preserve more glyph and effect detail, we propose a generator based on hierarchical dual styles AdaIN to reorganize content-styles representations from structure to texture gradually. Extensive experiments demonstrate the superiority of our model in generating high-quality artistic font images with unseen style compositions against other state-of-the-art methods. The source code and data is available at moonlight03.github.io/CAFS-GAN/.
List of keywords
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
639
Discrepancy-Guided Reconstruction Learning for Image Forgery Detection
Zenan Shi, Haipeng Chen, Long Chen, Dong Zhang
[+] More
[-] Less
In this paper, we propose a novel image forgery detection paradigm for boosting the model learning capacity on both forgery-sensitive and genuine compact visual patterns. Compared to the existing methods that only focus on the discrepant-specific patterns (\eg, noises, textures, and frequencies), our method has a greater generalization. Specifically, we first propose a Discrepancy-Guided Encoder (DisGE) to extract forgery-sensitive visual patterns. DisGE consists of two branches, where the mainstream backbone branch is used to extract general semantic features, and the accessorial discrepant external attention branch is used to extract explicit forgery cues. Besides, a Double-Head Reconstruction (DouHR) module is proposed to enhance genuine compact visual patterns in different granular spaces. Under DouHR, we further introduce a Discrepancy-Aggregation Detector (DisAD) to aggregate these genuine compact visual patterns, such that the forgery detection capability on unknown patterns can be improved. Extensive experimental results on four challenging datasets validate the effectiveness of our proposed method against state-of-the-art competitors.
List of keywords
Computer Vision -> CV: Biometrics, face, gesture and pose recognition
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
644
Dichotomous Image Segmentation with Frequency Prior Knowledge
Yan Zhou, Bo Dong, Yuanfeng Wu, Wentao Zhu, Geng Chen, Yanning Zhang
[+] More
[-] Less
Dichotomous image segmentation (DIS) has a wide range of real-world applications and gained increasing research attention in recent years. In this paper, we propose to tackle DIS with informative Frequency Prior Knowledge (FPK). Our model, called FPK-DIS, stems from the fact that prior knowledge in the frequency domain can provide valuable cues to identify fine-grained object boundaries. Specifically, we propose a frequency priors generator to jointly utilize fixed filters and learnable filters to extract informative FPK. Before embedding the FPK into network, we first harmonize the multi-scale side-out features to reduce their heterogeneity. This is achieved by our feature harmonization module, which is based on a gating mechanism to harmonize the grouped features. Finally, we propose a frequency priors embedding module to embed the FPK into multi-scale features through an adaptive modulation strategy. Extensive experiments on the benchmark dataset, DIS5K, demonstrate that our FPK-DIS outperforms 17 state-ofthe-art methods by a large margin in terms of key evaluation metrics.
List of keywords
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Scene analysis and understanding   
Robotics -> ROB: Perception
648
HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning
Xinting Liao, Weiming Liu, Chaochao Chen, Pengyang Zhou, Huabin Zhu, Yanchao Tan, Jun Wang, Yue Qi
[+] More
[-] Less
Federated learning (FL) collaboratively models user data in a decentralized way. However, in the real world, non-identical and independent data distributions (non-IID) among clients hinder the performance of FL due to three issues, i.e., (1) the class statistics shifting, (2) the insufficient hierarchical information utilization, and (3) the inconsistency in aggregating clients. To address the above issues, we propose HyperFed which contains three main modules, i.e., hyperbolic prototype Tammes initialization (HPTI), hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly, HPTI in the server constructs uniformly distributed and fixed class prototypes, and shares them with clients to match class statistics, further guiding consistent feature representation for local clients. Secondly, HPL in each client captures the hierarchical information in local data with the supervision of shared class prototypes in the hyperbolic model space. Additionally, CA in the server mitigates the impact of the inconsistent deviations from clients to server. Extensive studies of four datasets prove that HyperFed is effective in enhancing the performance of FL under the non-IID setting.
List of keywords
Machine Learning -> ML: Federated learning
Computer Vision -> CV: Representation learning
656
Learning 3D Photography Videos via Self-supervised Diffusion on Single Images
Xiaodong Wang, Chenfei Wu, Shengming Yin, Minheng Ni, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan
[+] More
[-] Less
3D photography renders a static image into a video with appealing 3D visual effects. Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints, and finally use an inpainting model to fill those missing/occluded regions. The inpainting model plays a crucial role in rendering quality, but it is normally trained on out-of-domain data. To reduce the training and inference gap, we propose a novel self-supervised diffusion model as the inpainting module. Given a single input image, we automatically construct a training pair of the masked occluded image and the ground-truth image with random cycle-rendering. The constructed training samples are closely aligned to the testing instances, without the need of data annotation. To make full use of the masked images, we design a Masked Enhanced Block (MEB), which can be easily plugged into the UNet and enhance the semantic conditions. Towards real-world animation, we present a novel task: out-animation, which extends the space and time of input objects. Extensive experiments on real datasets show that our method achieves competitive results with existing SOTA methods.
List of keywords
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
Computer Vision -> CV: Vision and language 
658
Align, Perturb and Decouple: Toward Better Leverage of Difference Information for RSI Change Detection
Supeng Wang, Yuxi Li, Ming Xie, mingmin Chi, Yabiao Wang, Chengjie Wang, wenbing zhu
[+] More
[-] Less
Change detection is a widely adopted technique in remote sense imagery (RSI) analysis to discover long-term geomorphic evolution. To highlight the areas of semantic changes, previous effort mostly pays attention to learning representative feature descriptors of single image, while the difference information is either modeled with a simple difference operations or implicitly embeded in feature interactions. Nevertheless, such difference modeling can be noisy since it suffers from non-semantic changes and lacks explicit guidance from image content or context. In this paper, we revisit the importance of feature difference for change detection in RSI, and propose series of operations to fully exploit the difference information: Alignment, Perturbation and Decoupling (APD). Firstly, alignment leverages the contextual similarity to compensate for non-semantic difference in feature space. Next, a difference module trained with semantic-wise perturbation is adopted to learn more generalized change estimators, which reversely bootstraps feature extraction and prediction. Finally, a decoupled dual-decoder structure is designed to predict semantic changes in both content-aware and content-agnostic manners. Extensive experiments are conducted on benchmarks of LEVIR-CD, WHU-CD and DSIFN-CD, demonstrating our proposed operations brings significant improvement and achieves competitive results under the same conditions.
List of keywords
Computer Vision -> CV: Scene analysis and understanding   
660
Semi-supervised Domain Adaptation via Prototype-based Multi-level Learning
Xinyang Huang, Chuang Zhu, Wenkai Chen
[+] More
[-] Less
In semi-supervised domain adaptation (SSDA), a few labeled target samples of each class help the model to transfer knowledge representation from the fully labeled source domain to the target domain. Many existing methods ignore the benefits of making full use of the labeled target samples from multi-level. To make better use of this additional data, we propose a novel Prototype-based Multi-level Learning (ProML) framework to better tap the potential of labeled target samples. To achieve intra-domain adaptation, we first introduce a pseudo-label aggregation based on the intra-domain optimal transport to help the model align the feature distribution of unlabeled target samples and the prototype. At the inter-domain level, we propose a cross-domain alignment loss to help the model use the target prototype for cross-domain knowledge transfer. We further propose a dual consistency based on prototype similarity and linear classifier to promote discriminative learning of compact target feature representation at the batch level. Extensive experiments on three datasets, including DomainNet, VisDA2017, and Office-Home, demonstrate that our proposed method achieves state-of-the-art performance in SSDA. Our code is available at https://github.com/bupt-ai-cz/ProML.
List of keywords
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
Computer Vision -> CV: Recognition (object detection, categorization)
663
Multi-level Graph Contrastive Prototypical Clustering
Yuchao Zhang, Yuan Yuan, Qi Wang
[+] More
[-] Less
Recently, graph neural networks (GNNs) have drawn a surge of investigations in deep graph clustering. Nevertheless, existing approaches predominantly are inclined to semantic-agnostic since GNNs exhibit inherent limitations in capturing global underlying semantic structures. Meanwhile, multiple objectives are imposed within one latent space, whereas representations from different granularities may presumably conflict with each other, yielding severe performance degradation for clustering. To this end, we propose a novel Multi-Level Graph Contrastive Prototypical Clustering (MLG-CPC) framework for end-to-end clustering. Specifically, a Prototype Discrimination (ProDisc) objective function is proposed to explicitly capture semantic information via cluster assignments. Moreover, to alleviate the issue of objectives conflict, we introduce to perceive representations of different granularities within individual feature-, prototypical-, and cluster-level spaces by the feature decorrelation, prototype contrast, and cluster space consistency respectively. Extensive experiments on four benchmarks demonstrate the superiority of the proposed MLG-CPC against the state-of-the-art graph clustering approaches.
List of keywords
Machine Learning -> ML: Unsupervised learning
Machine Learning -> ML: Clustering
Machine Learning -> ML: Multi-view learning
668
Universal Adaptive Data Augmentation
Xiaogang Xu, Hengshuang Zhao
[+] More
[-] Less
Existing automatic data augmentation (DA) methods either ignore updating DA’s parameters according to the target model’s state during training or adopt update strategies that are not effective enough. In this work, we design a novel data augmentation strategy called “Universal Adaptive Data Augmentation" (UADA). Different from existing methods, UADA would adaptively update DA’s parameters according to the target model’s gradient information during training: given a pre-defined set of DA operations, we randomly decide types and magnitudes of DA operations for every data batch during training, and adaptively update DA’s parameters along the gradient direction of the loss concerning DA’s parameters. In this way, UADA can increase the training loss of the target networks, and the target networks would learn features from harder samples to improve the generalization. Moreover, UADA is very general and can be utilized in numerous tasks, e.g., image classification, semantic segmentation and object detection. Extensive experiments with various models are conducted on CIFAR-10, CIFAR-100, ImageNet, tiny-ImageNet, Cityscapes, and VOC07+12 to prove the significant performance improvements brought by UADA.
List of keywords
Computer Vision -> CV: Scene analysis and understanding   
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Segmentation
669
Graph Propagation Transformer for Graph Representation Learning
Zhe Chen, Hao Tan, Tao Wang, Tianrun Shen, Tong Lu, Qiuying Peng, Cheng Cheng, Yue Qi
[+] More
[-] Less
This paper presents a novel transformer architecture for graph representation learning. The core insight of our method is to fully consider the information propagation among nodes and edges in a graph when building the attention module in the transformer blocks. Specifically, we propose a new attention mechanism called Graph Propagation Attention (GPA). It explicitly passes the information among nodes and edges in three ways, i.e. node-to-node, node-to-edge, and edge-to-node, which is essential for learning graph-structured data. On this basis, we design an effective transformer architecture named Graph Propagation Transformer (GPTrans) to further help learn graph data. We verify the performance of GPTrans in a wide range of graph learning experiments on several benchmark datasets. These results show that our method outperforms many state-of-the-art transformer-based graph models with better performance. The code and models will be made available.
List of keywords
Machine Learning -> ML: Applications
Machine Learning -> ML: Attention models
Machine Learning -> ML: Sequence and graph learning
680
First-Choice Maximality Meets Ex-ante and Ex-post Fairness
Xiaoxi Guo, Sujoy Sikdar, Lirong Xia, Yongzhi Cao, Hanpin Wang
[+] More
[-] Less
We design randomized mechanisms for assigning multiple indivisible items to a group of agents given their ordinal preferences. We show that our mechanisms output assignments that satisfy desirable combinations of efficiency and fairness properties both ex-ante and ex-post. The generalized eager Boston mechanism is ex-ante envy-free and ex-post envy-free up to one item (EF1). The generalized probabilistic Boston mechanism satisfies EF1 and an ex-ante guarantee of efficiency instead of fairness. Our mechanisms are also ex-post Pareto-efficient and first-choice maximal, i.e., they maximize the number of agents assigned their first choices. In doing so, we expand the frontiers of simultaneously providing efficiency and both ex-ante and ex-post fairness guarantees for the assignment problem.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Computational social choice
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Fair division
683
Non-Lambertian Multispectral Photometric Stereo via Spectral Reflectance Decomposition
Jipeng Lv, Heng Guo, Guanying CHEN, Jinxiu Liang, Boxin Shi
[+] More
[-] Less
Multispectral photometric stereo (MPS) aims at recovering the surface normal of a scene from a single-shot multispectral image captured under multispectral illuminations. Existing MPS methods adopt the Lambertian reflectance model to make the problem tractable, but it greatly limits their application to real-world surfaces. In this paper, we propose a deep neural network named NeuralMPS to solve the MPS problem under non-Lambertian spectral reflectances. Specifically, we present a spectral reflectance decomposition model to disentangle the spectral reflectance into a geometric component and a spectral component. With this decomposition, we show that the MPS problem for surfaces with a uniform material is equivalent to the conventional photometric stereo (CPS) with unknown light intensities. In this way, NeuralMPS reduces the difficulty of the non-Lambertian MPS problem by leveraging the well-studied non-Lambertian CPS methods. Experiments on both synthetic and real-world scenes demonstrate the effectiveness of our method.
List of keywords
Computer Vision -> CV: Computational photography
Computer Vision -> CV: 3D computer vision
684
Feature Staleness Aware Incremental Learning for CTR Prediction
Zhikai Wang, Yanyan Shen, Zibin Zhang, Kangyi Lin
[+] More
[-] Less
Click-through Rate (CTR) prediction in real-world recommender systems often deals with billions of user interactions every day. To improve the training efficiency, it is common to update the CTR prediction model incrementally using the new incremental data and a subset of historical data. However, the feature embeddings of a CTR prediction model often get stale when the corresponding features do not appear in current incremental data. In the next period, the model would have a performance degradation on samples containing stale features, which we call the feature staleness problem. To mitigate this problem, we propose a Feature Staleness Aware Incremental Learning method for CTR prediction (FeSAIL) which adaptively replays samples containing stale features. We first introduce a staleness-aware sampling algorithm (SAS) to sample a fixed number of stale samples with high sampling efficiency. We then introduce a staleness-aware regularization mechanism (SAR) for a fine-grained control of the feature embedding updating. We instantiate FeSAIL with a general deep learning-based CTR prediction model and the experimental results demonstrate FeSAIL outperforms various state-of-the-art methods on four benchmark datasets. The code can be found in https://github.com/cloudcatcher888/FeSAIL.
List of keywords
Data Mining -> DM: Recommender systems
Machine Learning -> ML: Incremental learning
694
CADParser: A Learning Approach of Sequence Modeling for B-Rep CAD
Shengdi Zhou, Bin Zhou, Tianyi Tang
[+] More
[-] Less
Computer-Aided Design (CAD) plays an essential role in industrial manufacturing. An entire manufactured object always contains geometry information and the construction workflow. With the construction information, a parametric CAD model can be re-edited effectively. Unlike the mesh or point cloud, boundary representation (B-Rep) is commercially a standard format for the geometry structure. Since there are no uniform criteria to store the construction workflow, JSON format is an alternative. Unfortunately, most manufactured CAD models on the Internet only provide geometry information without the construction procedure reducing the efficiency of the creation. This paper proposes a learning approach to infer the underlying modeling sequences given a B-Rep CAD model by treating the CAD geometry structure as a graph and the construction workflow as a sequence. Since the existing CAD dataset only contains two operations (i.e., Sketch and Extrusion), limiting the diversity of the CAD model creation, we introduce a large-scale dataset with diverse operations (e.g., Revolution, Fillet, Chamfer). Each model includes both the geometry structure and the construction sequences. Extensive experiments demonstrate that our method outperforms the existing state-of-the-art methods quantitatively and qualitatively.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications
704
FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning
Yuanyuan Chen, Zichen Chen, Pengcheng Wu, Han Yu
[+] More
[-] Less
Large-scale neural networks possess considerable expressive power. They are well-suited for complex learning tasks in industrial applications. However, large-scale models pose significant challenges for training under the current Federated Learning (FL) paradigm. Existing approaches for efficient FL training often leverage model parameter dropout. However, manipulating individual model parameters is not only inefficient in meaningfully reducing the communication overhead when training large-scale FL models, but may also be detrimental to the scaling efforts and model performance as shown by recent research. To address these issues, we propose the Federated Opportunistic Block Dropout (FedOBD) approach. The key novelty is that it decomposes large-scale models into semantic blocks so that FL participants can opportunistically upload quantized blocks, which are deemed to be significant towards training the model, to the FL server for aggregation. Extensive experiments evaluating FedOBD against four state-of-the-art approaches based on multiple real-world datasets show that it reduces the overall communication overhead by more than 88% compared to the best performing baseline approach, while achieving the highest test accuracy. To the best of our knowledge, FedOBD is the first approach to perform dropout on FL models at the block level rather than at the individual parameter level.
List of keywords
Machine Learning -> ML: Federated learning
Machine Learning -> ML: Learning sparse models
Machine Learning -> ML: Optimization
705
Some Might Say All You Need Is Sum
Eran Rosenbluth, Jan Tönshoff, Martin Grohe
[+] More
[-] Less
The expressivity of Graph Neural Networks (GNNs) is dependent on the aggregation functions they employ. Theoretical works have pointed towards Sum aggregation GNNs subsuming every other GNNs, while certain practical works have observed a clear advantage to using Mean and Max. An examination of the theoretical guarantee identifies two caveats. First, it is size-restricted, that is, the power of every specific GNN is limited to graphs of a certain maximal size. Successfully processing larger graphs may require an other GNN, and so on. Second, it concerns the power to distinguish non-isomorphic graphs, not the power to approximate general functions on graphs, and the former does not necessarily imply the latter. It is important that a GNN’s usability will not be limited to graphs of any certain maximal size. Therefore, we explore the realm of unrestricted-size expressivity. We prove that simple functions, which can be computed exactly by Mean or Max GNNs, are inapproximable by any Sum GNN. We prove that under certain restrictions, every Mean or Max GNNs can be approximated by a Sum GNN, but even there, a combination of (Sum, [Mean/Max]) is more expressive than Sum alone. Lastly, we prove further expressivity limitations of Sum-GNNs.
List of keywords
Machine Learning -> ML: Theory of deep learning
Machine Learning -> ML: Learning theory
Machine Learning -> ML: Sequence and graph learning
721
Complexity of Efficient Outcomes in Binary-Action Polymatrix Games and Implications for Coordination Problems
Argyrios Deligkas, Gregory Gutin, Eduard Eiben, Philip Neary, Anders Yeo
[+] More
[-] Less
We investigate the difficulty of finding economically efficient solutions to coordination problems on graphs. Our work focuses on two forms of coordination: games of strategic complements (pure-coordination) and games of strategic substitutes (anti-coordination). We consider three objectives in the context of simple binary-action polymatrix games: (a) maximizing welfare, (b) maximizing potential, and (c) finding a welfare-maximizing Nash equilibrium. We introduce an intermediate, new graph-partition problem, termed Maximum Weighted Digraph Partition, which is of independent interest, and we provide a dichotomy for it. This dichotomy, among other results, provides as a corollary a dichotomy for objective (a) for general binary-action polymatrix games. In addition, it reveals that the complexity of achieving these objectives varies depending on the form of the coordination problem. Specifically, objectives (a) and (b) can be efficiently solved in pure-coordination games, but are NP-hard in anti-coordination games. Finally, we show that objective (c) is NP-hard even for the simplest non-trivial pure-coordination games.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
728
Dynamic Belief for Decentralized Multi-Agent Cooperative Learning
Yunpeng Zhai, Peixi Peng, Chen Su, Yonghong Tian
[+] More
[-] Less
Decentralized multi-agent cooperative learning is a practical task due to the partially observed setting both in training and execution. Every agent learns to cooperate without access to the observations and policies of others. However, the decentralized training of multi-agent is of great difficulty due to non-stationarity, especially when other agents’ policies are also in learning during training. To overcome this, we propose to learn a dynamic policy belief for each agent to predict the current policies of other agents and accordingly condition the policy of its own. To quickly adapt to the development of others’ policies, we introduce a historical context to learn the belief inference according to a few recent action histories of other agents and a latent variational inference to model their policies by a learned distribution. We evaluate our method on the StarCraft II micro management task (SMAC) and demonstrate its superior performance in the decentralized training settings and comparable results with the state-of-the-art CTDE methods.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
729
The #DNN-Verification Problem: Counting Unsafe Inputs for Deep Neural Networks
Luca Marzari, Davide Corsi, Ferdinando Cicalese, Alessandro Farinelli
[+] More
[-] Less
Deep Neural Networks are increasingly adopted in critical tasks that require a high level of safety, e.g., autonomous driving. While state-of-the-art verifiers can be employed to check whether a DNN is unsafe w.r.t. some given property (i.e., whether there is at least one unsafe input configuration), their yes/no output is not informative enough for other purposes, such as shielding, model selection, or training improvements. In this paper, we introduce the #DNN-Verification problem, which involves counting the number of input configurations of a DNN that result in a violation of a particular safety property. We analyze the complexity of this problem and propose a novel approach that returns the exact count of violations. Due to the #P-completeness of the problem, we also propose a randomized, approximate method that provides a provable probabilistic bound of the correct count while significantly reducing computational requirements. We present experimental results on a set of safety-critical benchmarks that demonstrate the effectiveness of our approximate method and evaluate the tightness of the bound.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
731
Generalization Guarantees of Self-Training of Halfspaces under Label Noise Corruption
Lies Hadjadj, Massih-Reza Amini, Sana Louhichi
[+] More
[-] Less
We investigate the generalization properties of a self-training algorithm with halfspaces. The approach learns a list of halfspaces iteratively from labeled and unlabeled training data, in which each iteration consists of two steps: exploration and pruning. In the exploration phase, the halfspace is found sequentially by maximizing the unsigned-margin among unlabeled examples and then assigning pseudo-labels to those that have a distance higher than the current threshold. These pseudo-labels are allegedly corrupted by noise. The training set is then augmented with noisy pseudo-labeled examples, and a new classifier is trained. This process is repeated until no more unlabeled examples remain for pseudo-labeling. In the pruning phase, pseudo-labeled samples that have a distance to the last halfspace greater than the associated unsigned-margin are then discarded. We prove that the misclassification error of the resulting sequence of classifiers is bounded and show that the resulting semi-supervised approach never degrades performance compared to the classifier learned using only the initial labeled training set. Experiments carried out on a variety of benchmarks demonstrate the efficiency of the proposed approach compared to state-of-the-art methods.
List of keywords
Machine Learning -> ML: Learning theory
Machine Learning -> ML: Semi-supervised learning
736
Sequential Recommendation with Probabilistic Logical Reasoning
Huanhuan Yuan, Pengpeng Zhao, Xuefeng Xian, Guanfeng Liu, Yanchi Liu, Victor S. Sheng, Lei Zhao
[+] More
[-] Less
Deep learning and symbolic learning are two frequently employed methods in Sequential Recommendation (SR). Recent neural-symbolic SR models demonstrate their potential to enable SR to be equipped with concurrent perception and cognition capacities. However, neural-symbolic SR remains a challenging problem due to open issues like representing users and items in logical reasoning. In this paper, we combine the Deep Neural Network (DNN) SR models with logical reasoning and propose a general framework named Sequential Recommendation with Probabilistic Logical Reasoning (short for SR-PLR). This framework allows SR-PLR to benefit from both similarity matching and logical reasoning by disentangling feature embedding and logic embedding in the DNN and probabilistic logic network. To better capture the uncertainty and evolution of user tastes, SR-PLR embeds users and items with a probabilistic method and conducts probabilistic logical reasoning on users’ interaction patterns. Then the feature and logic representations learned from the DNN and logic network are concatenated to make the prediction. Finally, experiments on various sequential recommendation models demonstrate the effectiveness of the SR-PLR. Our code is available at https://github.com/Huanhuaneryuan/SR-PLR.
List of keywords
Data Mining -> DM: Recommender systems
Data Mining -> DM: Collaborative filtering
747
Learning Few-shot Sample-set Operations for Noisy Multi-label Aspect Category Detection
Shiman Zhao, Wei Chen, Tengjiao Wang
[+] More
[-] Less
Multi-label Aspect Category Detection (MACD) is essential for aspect-based sentiment analysis, which aims to identify multiple aspect categories in a given sentence. Few-shot MACD is critical due to the scarcity of labeled data. However, MACD is a high-noise task, and existing methods fail to address it with only two or three training samples per class, which limits the application in practice. To solve above issues, we propose a group of Few-shot Sample-set Operations (FSO) to solve noisy MACD in fewer sample scenarios by identifying the semantic contents of samples. Learning interactions among intersection, subtraction, and union networks, the FSO imitates arithmetic operations on samples to distinguish relevant and irrelevant aspect contents. Eliminating the negative effect caused by noises, the FSO extracts discriminative prototypes and customizes a dedicated query vector for each class. Besides, we design a multi-label architecture, which integrates with score-wise loss and multi-label loss to optimize the FSO for multi-label prediction, avoiding complex threshold training or selection. Experiments show that our method achieves considerable performance. Significantly, it improves by 11.01% at most and an average of 8.59% Macro-F in fewer sample scenarios.
List of keywords
Natural Language Processing -> NLP: Sentiment analysis, stylistic analysis, and argument mining
Machine Learning -> ML: Few-shot learning
Natural Language Processing -> NLP: Dialogue and interactive systems
752
Federated Probabilistic Preference Distribution Modelling with Compactness Co-Clustering for Privacy-Preserving Multi-Domain Recommendation
Weiming Liu, Chaochao Chen, Xinting Liao, Mengling Hu, Jianwei Yin, Yanchao Tan, Longfei Zheng
[+] More
[-] Less
With the development of modern internet techniques, Cross-Domain Recommendation (CDR) systems have been widely exploited for tackling the data-sparsity problem. Meanwhile most current CDR models assume that user-item interactions are accessible across different domains. However, such knowledge sharing process will break the privacy protection policy. In this paper, we focus on the Privacy-Preserving Multi-Domain Recommendation problem (PPMDR). The problem is challenging since different domains are sparse and heterogeneous with the privacy protection. To tackle the above issues, we propose Federated Probabilistic Preference Distribution Modelling (FPPDM). FPPDM includes two main components, i.e., local domain modelling component and global server aggregation component with federated learning strategy. The local domain modelling component aims to exploit user/item preference distributions using the rating information in the corresponding domain. The global server aggregation component is set to combine user characteristics across domains. To better extract semantic neighbors information among the users, we further provide compactness co-clustering strategy in FPPDM ++ to cluster the users with similar characteristics. Our empirical studies on benchmark datasets demonstrate that FPPDM/ FPPDM ++ significantly outperforms the state-of-the-art models.
List of keywords
Data Mining -> DM: Recommender systems
757
Mean Payoff Optimization for Systems of Periodic Service and Maintenance
David Klaska, Antonin Kucera, Vit Musil, Vojtech Rehak
[+] More
[-] Less
Consider oriented graph nodes requiring periodic visits by a service agent. The agent moves among the nodes and receives a payoff for each completed service task, depending on the time elapsed since the previous visit to a node. We consider the problem of finding a suitable schedule for the agent to maximize its long-run average payoff per time unit. We show that the problem of constructing an epsilon-optimal schedule is PSPACE-hard for every fixed non-negative epsilon, and that there exists an optimal periodic schedule of exponential length. We propose randomized finite-memory (RFM) schedules as a compact description of the agent’s strategies and design an efficient algorithm for constructing RFM schedules. Furthermore, we construct deterministic periodic schedules by sampling from RFM schedules.
List of keywords
Planning and Scheduling -> PS: Robot planning
Planning and Scheduling -> PS: Routing
Robotics -> ROB: Motion and path planning
774
RFENet: Towards Reciprocal Feature Evolution for Glass Segmentation
Ke Fan, Changan Wang, Yabiao Wang, Chengjie Wang, Ran Yi, Lizhuang Ma
[+] More
[-] Less
Glass-like objects are widespread in daily life but remain intractable to be segmented for most existing methods. The transparent property makes it difficult to be distinguished from background, while the tiny separation boundary further impedes the acquisition of their exact contour. In this paper, by revealing the key co-evolution demand of semantic and boundary learning, we propose a Multi-scale Selective Mutual (MSM) module to enable the reciprocal feature learning between them. Then to exploit the global shape context, we propose a Structurally Attentive Refinement (SAR) module to conduct a fine-grained feature refinement for those ambiguous points around the boundary. To further utilize the multi-scale information, we simply design a cascaded structure combined with the above two novel modules, and finally introduce Reciprocal Feature Evolution Network (RFENet) for effective glass-like object segmentation. Extensive experiments demonstrate that our RFENet achieves state-of-the-art performance on three popular public datasets.
List of keywords
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Scene analysis and understanding   
792
Optimal Seat Arrangement: What Are the Hard and Easy Cases?
Esra Ceylan, Jiehua Chen, Sanjukta Roy
[+] More
[-] Less
We study four NP-hard optimal seat arrangement problems [Bodlaender et al., 2020a] which each have as its input a set of n agents, where each agent has cardinal preferences over other agents, and an n-vertex undirected graph (called the seat graph). The task is to assign each agent to a distinct vertex in the seat graph such that either the sum of utilities or the minimum utility is maximized, or it is envy-free or exchange-stable. Aiming at identifying hard and easy cases, we extensively study the algorithmic complexity of the four problems by looking into natural graph classes for the seat graph (e.g., paths, cycles, stars, or matchings), problem-specific parameters (e.g., the number of non-isolated vertices in the seat graph or the maximum number of agents towards whom an agent has non-zero preferences), and preference structures (e.g., non-negative or symmetric preferences). For strict preferences and seat graphs with disjoint edges and isolated vertices, we correct an error by Bodlaender et al. [2020b] and show that finding an envy-free arrangement remains NP-hard in this case.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Computational social choice
Game Theory and Economic Paradigms -> GTEP: Cooperative games
Game Theory and Economic Paradigms -> GTEP: Fair division
794
Realistic Cell Type Annotation and Discovery for Single-cell RNA-seq Data
Yuyao Zhai, Liang Chen, Minghua Deng
[+] More
[-] Less
The rapid development of single-cell RNA sequencing (scRNA-seq) technologies allows us to explore tissue heterogeneity at the cellular level. Cell type annotation plays an essential role in the substantial downstream analysis of scRNA-seq data. Existing methods usually classify the novel cell types in target data as an “unassigned” group and rarely discover the fine-grained cell type structure among them. Besides, these methods carry risks, such as susceptibility to batch effect between reference and target data, thus further compromising of inherent discrimination of target data. Considering these limitations, here we propose a new and practical task called realistic cell type annotation and discovery for scRNA-seq data. In this task, cells from seen cell types are given class labels, while cells from novel cell types are given cluster labels. To tackle this problem, we propose an end-to-end algorithm framework called scPOT from the perspective of optimal transport (OT). Specifically, we first design an OT-based prototypical representation learning paradigm to encourage both global discriminations of clusters and local consistency of cells to uncover the intrinsic structure of target data. Then we propose an unbalanced OT-based partial alignment strategy with statistical filling to detect the cells from the seen cell types across reference and target data. Notably, scPOT also introduces an easy yet effective solution to automatically estimate the overall cell type number in target data. Extensive results on our carefully designed evaluation benchmarks demonstrate the superiority of scPOT over various state-of-the-art clustering and annotation methods.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Bioinformatics
Machine Learning -> ML: Applications
795
FGNet: Towards Filling the Intra-class and Inter-class Gaps for Few-shot Segmentation
Yuxuan Zhang, Wei Yang, Shaowei Wang
[+] More
[-] Less
Current few-shot segmentation (FSS) approaches have made tremendous achievements based on prototypical learning techniques. However, due to the scarcity of the support data provided, FSS methods still suffer from the intra-class and inter-class gaps. In this paper, we propose a uniform network to fill both the gaps, termed FGNet. It consists of the novel design of a Self-Adaptive Module (SAM) to emphasize the query feature to generate an enhanced prototype for self-alignment. Such a prototype caters to each query sample itself since it contains the underlying intra-instance information, which gets around the intra-class appearance gap. Moreover, we design an Inter-class Feature Separation Module (IFSM) to separate the feature space of the target class from other classes, which contributes to bridging the inter-class gap. In addition, we present several new losses and a method termed B-SLIC, which help to further enhance the separation performance of FGNet. Experimental results show that FGNet reduces both the gaps for FSS by SAM and IFSM respectively, and achieves state-of-the-art performances on both PASCAL-5i and COCO-20i datasets compared with previous top-performing approaches.
List of keywords
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
797
Leveraging Argumentation for Generating Robust Sample-based Explanations
Leila Amgoud, Philippe Muller, Henri Trenquier
[+] More
[-] Less
Explaining predictions made by inductive classifiers has become crucial with the rise of complex models acting more and more as black-boxes. Abductive explanations are one of the most popular types of explanations that are provided for the purpose. They highlight feature-values that are sufficient for making predictions. In the literature, they are generated by exploring the whole feature space, which is unreasonable in practice. This paper solves the problem by introducing explanation functions that generate abductive explanations from a sample of instances. It shows that such functions should be defined with great care since they cannot satisfy two desirable properties at the same time, namely existence of explanations for every individual decision (success) and correctness of explanations (coherence). The paper provides a parameterized family of argumentation-based explanation functions, each of which satisfies one of the two properties. It studies their formal properties and their experimental behaviour on different datasets.
List of keywords
Knowledge Representation and Reasoning -> KRR: Argumentation
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
800
Parametrized Gradual Semantics Dealing with Varied Degrees of Compensation
Dragan Doder, Leila Amgoud, Srdjan Vesic
[+] More
[-] Less
Compensation is a strategy that a semantics may follow when it faces dilemmas between quality and quantity of attackers. It allows several weak attacks to compensate one strong attack. It is thus based on \textit{compensation degree}, which is a pair of two parameters: i) a parameter showing to what extent an attack is weak, and ii) a parameter indicating the number of weak attackers needed to compensate a strong one. Existing principles on compensation do not specify the parameters, thus it is unclear whether semantics satisfying them compensate at only one degree or several degrees, and which ones. This paper proposes a parameterised family of gradual semantics that is based on a parameter $\alpha$ taking values from the interval $(0,+\infty)$, each of which leads to a different semantics. The family unifies multiple semantics that share some principles but differ in their strategy regarding solving dilemmas. Indeed, we show that the two semantics taking the extreme values of $\alpha$ favor respectively quantity and quality, while all the remaining ones compensate at any degree. We define three classes of compensation degrees and show that the novel family is able to compensate at any of them while none of the existing gradual semantics does.
List of keywords
Knowledge Representation and Reasoning -> KRR: Argumentation
809
Solving Quantum-Inspired Perfect Matching Problems via Tutte’s Theorem-Based Hybrid Boolean Constraints
Moshe Vardi, Zhiwei Zhang
[+] More
[-] Less
Determining the satisfiability of Boolean constraint-satisfaction problems with different types of constraints, that is hybrid constraints, is a well-studied problem with important applications. We study here a new application of hybrid Boolean constraints, which arises in quantum computing. The problem relates to constrained perfect matching in edge-colored graphs. While general-purpose hybrid constraint solvers can be powerful, we show that direct encodings of the constrained-matching problem as hybrid constraints scale poorly and special techniques are still needed. We propose a novel encoding based on Tutte’s Theorem in graph theory as well as optimization techniques. Empirical results demonstrate that our encoding, in suitable languages with advanced SAT solvers, scales significantly better than a number of competing approaches on constrained-matching benchmarks. Our study identifies the necessity of designing problem-specific encodings when applying powerful general-purpose constraint solvers.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Modeling
827
Learning Monocular Depth in Dynamic Environment via Context-aware Temporal Attention
zizhang wu, Zhuozheng Li, Zhi-Gang Fan, Yunzhe Wu, Yuanzhu Gan, Jian Pu
[+] More
[-] Less
The monocular depth estimation task has recently revealed encouraging prospects, especially for the autonomous driving task. To tackle the ill-posed problem of 3D geometric reasoning from 2D monocular images, multi-frame monocular methods are developed to leverage the perspective correlation information from sequential temporal frames. However, moving objects like cars and trains usually violate the static scene assumption, leading to feature inconsistency deviation and misaligned cost values, which would mislead the optimization algorithm. In this work, we present CTA-Depth, a Context-aware Temporal Attention guided network for multi-frame monocular Depth estimation. Specifically, we first apply a multi-level attention enhancement module to integrate multi-level image features for obtaining an initial depth and pose estimation. Then the proposed CTA-Refiner is adopted to optimize the depth and pose iteratively. During the CTA-Refiner process, context-aware temporal attention (CTA) is developed to capture the global temporal-context corelations for keeping the feature consistency and estimation integrity of moving objects. Particularly, we propose the long-range geometry embedding (LGE) module to produce a long-range temporal geometry prior. Our approach achieves significant improvements (e.g., 13.5% for Abs Rel on the KITTI dataset) over state-of-the-art approaches on three benchmark datasets. We will release our code for implementation after paper acceptance.
List of keywords
Computer Vision -> CV: 3D computer vision
830
InitLight: Initial Model Generation for Traffic Signal Control Using Adversarial Inverse Reinforcement Learning
Yutong Ye, Yingbo Zhou, Jiepin Ding, Ting Wang, Mingsong Chen, Xiang Lian
[+] More
[-] Less
Although Reinforcement Learning (RL) has been extensively studied for Traffic Signal Control (TSC), it still suffers from high learning costs. This is because the trial-and-error attempts during the learning process of RL agents result in poor performance at the beginning and slow convergence to an optimal solution. To address this issue, this paper proposes a novel Adversarial Inverse Reinforcement Learning (AIRL)-based approach, named InitLight, which can generate an effective initial model to improve the jump-start performance for multi-intersection TSC. To be concrete, we design an adversarial architecture to pre-train the RL model from expert trajectories by the learned reward function and transfer the trained initial model to a multi-intersection environment. Based on our proposed pre-training method, the generalizability and robustness of the initial model can be significantly improved. Comprehensive experimental results obtained from various well-known traffic benchmarks show that, compared with the state-of-the-art RL-based TSC methods, InitLight can not only converge faster to a competitive result, but also achieve near-optimal performance after the first episode and be robust to various traffic scenarios.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Transportation
Machine Learning -> ML: Deep reinforcement learning
Multidisciplinary Topics and Applications -> MDA: Sensor networks and smart cities
847
Efficient Multi-View Inverse Rendering Using a Hybrid Differentiable Rendering Method
Xiangyang Zhu, Yiling Pan, Bailin Deng, Bin Wang
[+] More
[-] Less
Recovering the shape and appearance of real-world objects from natural 2D images is a long-standing and challenging inverse rendering problem. In this paper, we introduce a novel hybrid differentiable rendering method to efficiently reconstruct the 3D geometry and reflectance of a scene from multi-view images captured by conventional hand-held cameras. Our method follows an analysis-by-synthesis approach and consists of two phases. In the initialization phase, we use traditional SfM and MVS methods to reconstruct a virtual scene roughly matching the real scene. Then in the optimization phase, we adopt a hybrid approach to refine the geometry and reflectance, where the geometry is first optimized using an approximate differentiable rendering method, and the reflectance is optimized afterward using a physically-based differentiable rendering method. Our hybrid approach combines the efficiency of approximate methods with the high-quality results of physically-based methods. Extensive experiments on synthetic and real data demonstrate that our method can produce reconstructions with similar or higher quality than state-of-the-art methods while being more efficient.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications
848
Label Enhancement via Joint Implicit Representation Clustering
Yunan Lu, Weiwei Li, Xiuyi Jia
[+] More
[-] Less
Label distribution is an effective label form to portray label polysemy (i.e., the cases that an instance can be described by multiple labels simultaneously). However, the expensive annotating cost of label distributions limits its application to a wider range of practical tasks. Therefore, LE (label enhancement) techniques are extensively studied to solve this problem. Existing LE algorithms mostly estimate label distributions by the instance relation or the label relation. However, they suffer from biased instance relations, limited model capabilities, or suboptimal local label correlations. Therefore, in this paper, we propose a deep generative model called JRC to simultaneously learn and cluster the joint implicit representations of both features and labels, which can be used to improve any existing LE algorithm involving the instance relation or local label correlations. Besides, we develop a novel label distribution recovery module, and then integrate it with JRC model, thus constituting a novel generative label enhancement model that utilizes the learned joint implicit representations and instance clusters in a principled way. Finally, extensive experiments validate our proposal.
List of keywords
Machine Learning -> ML: Multi-label
Machine Learning -> ML: Unsupervised learning
Machine Learning -> ML: Weakly supervised learning
856
Graph Sampling-based Meta-Learning for Molecular Property Prediction
Xiang Zhuang, Qiang Zhang, Bin Wu, Keyan Ding, Yin Fang, Huajun Chen
[+] More
[-] Less
Molecular property is usually observed with a limited number of samples, and researchers have considered property prediction as a few-shot problem. One important fact that has been ignored by prior works is that each molecule can be recorded with several different properties simultaneously. To effectively utilize many-to-many correlations of molecules and properties, we propose a Graph Sampling-based Meta-learning (GS-Meta) framework for few-shot molecular property prediction. First, we construct a Molecule-Property relation Graph (MPG): molecule and properties are nodes, while property labels decide edges. Then, to utilize the topological information of MPG, we reformulate an episode in meta-learning as a subgraph of the MPG, containing a target property node, molecule nodes, and auxiliary property nodes. Third, as episodes in the form of subgraphs are no longer independent of each other, we propose to schedule the subgraph sampling process with a contrastive loss function, which considers the consistency and discrimination of subgraphs. Extensive experiments on 5 commonly-used benchmarks show GS-Meta consistently outperforms state-of-the-art methods by 5.71%-6.93% in ROC-AUC and verify the effectiveness of each proposed module.
List of keywords
Machine Learning -> ML: Meta-learning
Machine Learning -> ML: Few-shot learning
Multidisciplinary Topics and Applications -> MDA: Bioinformatics
863
TG-VQA: Ternary Game of Video Question Answering
Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen
[+] More
[-] Less
Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them. However, since relying heavily on human instructions, i.e., annotations or priors, current contrastive learning-based VideoQA methods remains challenging to perform fine-grained visual-linguistic alignments. In this work, we innovatively resort to game theory, which can simulate complicated relationships among multiple players with specific interaction strategies, e.g., video, question, and answer as ternary players, to achieve fine-grained alignment for VideoQA task. Specifically, we carefully design a VideoQA-specific interaction strategy to tailor the characteristics of VideoQA, which can mathematically generate the fine-grained visual-linguistic alignment label without label-intensive efforts. Our TG-VQA outperforms existing state-of-the-art by a large margin (more than 5%) on long-term and short-term VideoQA datasets, verifying its effectiveness and generalization ability. Thanks to the guidance of game-theoretic interaction, our model impressively convergences well on limited data (10^4 videos), surpassing most of those pre-trained on large-scale data (10^7 videos).
List of keywords
Computer Vision -> CV: Visual reasoning and symbolic representation
Computer Vision -> CV: Scene analysis and understanding   
Computer Vision -> CV: Video analysis and understanding   
879
Data Level Lottery Ticket Hypothesis for Vision Transformers
Xuan Shen, Zhenglun Kong, Minghai Qin, Peiyan Dong, Geng Yuan, Xin Meng, Hao Tang, Xiaolong Ma, Yanzhi Wang
[+] More
[-] Less
The conventional lottery ticket hypothesis (LTH) claims that there exists a sparse subnetwork within a dense neural network and a proper random initialization method, called the winning ticket, such that it can be trained from scratch to almost as good as the dense counterpart. Meanwhile, the research of LTH in vision transformers (ViTs) is scarcely evaluated. In this paper, we first show that the conventional winning ticket is hard to find at weight level of ViTs by existing methods. Then, we generalize the LTH for ViTs to input data consisting of image patches inspired by the input dependence of ViTs. That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches. We call this subset of input patches the winning tickets, which represent a significant amount of information in the input data. We use a ticket selector to generate the winning tickets based on the informativeness of patches for various types of ViT, including DeiT, LV-ViT, and Swin Transformers. The experiments show that there is a clear difference between the performance of models trained with winning tickets and randomly selected subsets, which verifies our proposed theory. We elaborate the analogical similarity between our proposed Data-LTH-ViTs and the conventional LTH for further verifying the integrity of our theory. The Source codes are available at https://github.com/shen494157765/vit-lottery-ticket-input.
List of keywords
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Theory of deep learning
880
A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram
Ming-Liang Zhang, Fei yin, Cheng-Lin Liu
[+] More
[-] Less
Geometry problem solving (GPS) is a high-level mathematical reasoning requiring the capacities of multi-modal fusion and geometric knowledge application. Recently, neural solvers have shown great potential in GPS but still be short in diagram presentation and modal fusion. In this work, we convert diagrams into basic textual clauses to describe diagram features effectively, and propose a new neural solver called PGPSNet to fuse multi-modal information efficiently. Combining structural and semantic pre-training, data augmentation and self-limited decoding, PGPSNet is endowed with rich knowledge of geometry theorems and geometric representation, and therefore promotes geometric understanding and reasoning. In addition, to facilitate the research of GPS, we build a new large-scale and fine-annotated GPS dataset named PGPS9K, labeled with both fine-grained diagram annotation and interpretable solution program. Experiments on PGPS9K and an existing dataset Geometry3K validate the superiority of our method over the state-of-the-art neural solvers. Our code, dataset and appendix material are available at \url{https://github.com/mingliangzhang2018/PGPS}.
List of keywords
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
Machine Learning -> ML: Multi-modal learning
Multidisciplinary Topics and Applications -> MDA: Education
882
Incomplete Multi-view Clustering via Prototype-based Imputation
Haobin Li, Yunfan Li, Mouxing Yang, Peng Hu, Dezhong Peng, Xi Peng
[+] More
[-] Less
In this paper, we study how to achieve two characteristics highly-expected by incomplete multi-view clustering (IMvC). Namely, i) instance commonality refers to that within-cluster instances should share a common pattern, and ii) view versatility refers to that cross-view samples should own view-specific patterns. To this end, we design a novel dual-stream model which employs a dual attention layer and a dual contrastive learning loss to learn view-specific prototypes and model the sample-prototype relationship. When the view is missed, our model performs data recovery using the prototypes in the missing view and the sample-prototype relationship inherited from the observed view. Thanks to our dual-stream model, both cluster- and view-specific information could be captured, and thus the instance commonality and view versatility could be preserved to facilitate IMvC. Extensive experiments demonstrate the superiority of our method on five challenging benchmarks compared with 11 approaches. The code could be accessed from https://pengxi.me.
List of keywords
Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Clustering
892
WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation
Zesen Cheng, Peng Jin, Hao Li, Kehan Li, Siheng Li, Xiangyang Ji, Chang Liu, Jie Chen
[+] More
[-] Less
The top-down and bottom-up methods are two mainstreams of referring segmentation, while both methods have their own intrinsic weaknesses. Top-down methods are chiefly disturbed by Polar Negative (PN) errors owing to the lack of fine-grained cross-modal alignment. Bottom-up methods are mainly perturbed by Inferior Positive (IP) errors due to the lack of prior object information. Nevertheless, we discover that two types of methods are highly complementary for restraining respective weaknesses but the direct average combination leads to harmful interference. In this context, we build Win-win Cooperation (WiCo) to exploit complementary nature of two types of methods on both interaction and integration aspects for achieving a win-win improvement. For the interaction aspect, Complementary Feature Interaction (CFI) introduces prior object information to bottom-up branch and provides fine-grained information to top-down branch for complementary feature enhancement. For the integration aspect, Gaussian Scoring Integration (GSI) models the gaussian performance distributions of two branches and weighted integrates results by sampling confident scores from the distributions. With our WiCo, several prominent bottom-up and top-down combinations achieve remarkable improvements on three common datasets with reasonable extra costs, which justifies effectiveness and generality of our method.
List of keywords
Computer Vision -> CV: Vision and language 
Computer Vision -> CV: Segmentation
902
Cross-Domain Facial Expression Recognition via Disentangling Identity Representation
Tong Liu, Jing Li, Jia Wu, Lefei Zhang, Shanshan Zhao, Jun Chang, Jun Wan
[+] More
[-] Less
Most existing cross-domain facial expression recognition (FER) works require target domain data to assist the model in analyzing distribution shifts to overcome negative effects. However, it is often hard to obtain expression images of the target domain in practical applications. Moreover, existing methods suffer from the interference of identity information, thus limiting the discriminative ability of the expression features. We exploit the idea of domain generalization (DG) and propose a representation disentanglement model to address the above problems. Specifically, we learn three independent potential subspaces corresponding to the domain, expression, and identity information from facial images. Meanwhile, the extracted expression and identity features are recovered as Fourier phase information reconstructed images, thereby ensuring that the high-level semantics of images remain unchanged after disentangling the domain information. Our proposed method can disentangle expression features from expression-irrelevant ones (i.e., identity and domain features). Therefore, the learned expression features exhibit sufficient domain invariance and discriminative ability. We conduct experiments with different settings on multiple benchmark datasets, and the results show that our method achieves superior performance compared with state-of-the-art methods.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Representation learning
906
A Unification Framework for Euclidean and Hyperbolic Graph Neural Networks
Mehrdad khatir, Nurendra Choudhary, Sutanay Choudhury, Khushbu Agarwal, Chandan K Reddy
[+] More
[-] Less
Hyperbolic neural networks are able to capture the inherent hierarchy of graph datasets, and consequently, a powerful choice of GNNs. However, they entangle multiple incongruent (gyro-)vector spaces within a layer, which makes them limited in terms of generalization and scalability. In this work, we propose to use Poincare disk model as our search space, and apply all approximations on the disk (as if the disk is a tangent space derived from the origin), and thus getting rid of all inter-space transformations. Such an approach enables us to propose a hyperbolic normalization layer, and to further simplify the entire hyperbolic model to a Euclidean model cascaded with our hyperbolic normalization layer. We applied our proposed nonlinear hyperbolic normalization to the current state-of-the-art homogeneous and multi-relational graph networks. We demonstrate that not only does the model leverage the power of Euclidean networks such as interpretability and efficient execution of various model components, but also it outperforms both Euclidean and hyperbolic counterparts in our benchmarks.
List of keywords
Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Representation learning
920
On Efficient Transformer-Based Image Pre-training for Low-Level Vision
Wenbo Li, Xin Lu, Shengju Qian, Jiangbo Lu
[+] More
[-] Less
Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems. In this paper, we tailor transformer-based pre-training regimes that boost various low-level tasks. To comprehensively diagnose the influence of pre-training, we design a whole set of principled evaluation tools that uncover its effects on internal representations. The observations demonstrate that pre-training plays strikingly different roles in low-level tasks. For example, pre-training introduces more local information to intermediate layers in super-resolution (SR), yielding significant performance gains, while pre-training hardly affects internal feature representations in denoising, resulting in limited gains. Further, we explore different methods of pre-training, revealing that multi-related-task pre-training is more effective and data-efficient than other alternatives. Finally, we extend our study to varying data scales and model sizes, as well as comparisons between transformers and CNNs. Based on the study, we successfully develop state-of-the-art models for multiple low-level tasks.
List of keywords
Computer Vision -> CV: Computational photography
Computer Vision -> CV: Applications
Computer Vision -> CV: Representation learning
924
STS-GAN: Can We Synthesize Solid Texture with High Fidelity from Arbitrary 2D Exemplar?
Xin Zhao, Jifeng Guo, Lin Wang, Fanqi Li, Jiahao Li, Junteng Zheng, Bo Yang
[+] More
[-] Less
Solid texture synthesis (STS), an effective way to extend a 2D exemplar to a 3D solid volume, exhibits advantages in computational photography. However, existing methods generally fail to accurately learn arbitrary textures, which may result in the failure to synthesize solid textures with high fidelity. In this paper, we propose a novel generative adversarial nets-based framework (STS-GAN) to extend the given 2D exemplar to arbitrary 3D solid textures. In STS-GAN, multi-scale 2D texture discriminators evaluate the similarity between the given 2D exemplar and slices from the generated 3D texture, promoting the 3D texture generator synthesizing realistic solid textures. Finally, experiments demonstrate that the proposed method can generate high-fidelity solid textures with similar visual characteristics to the 2D exemplar.
List of keywords
Computer Vision -> CV: Computational photography
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
928
PPAT: Progressive Graph Pairwise Attention Network for Event Causality Identification
Zhenyu Liu, Zhenran Xu, Baotian Hu, Min Zhang
[+] More
[-] Less
Event Causality Identification (ECI) aims to identify the causality between a pair of event mentions in a document, which is composed of sentence-level ECI (SECI) and document-level ECI (DECI). Previous work applies various reasoning models to identify the implicit event causality. However, they indiscriminately reason all event causality in the same way, ignoring that most inter-sentence event causality depends on intra-sentence event causality to infer. In this paper, we propose a progressive graph pairwise attention network (PPAT) to consider the above dependence. PPAT applies a progressive reasoning strategy, as it first predicts the intra-sentence event causality, and then infers the more implicit inter-sentence event causality based on the SECI result. We construct a sentence boundary event relational graph, and PPAT leverages a simple pairwise attention mechanism, which attends to different reasoning chains on the graph. In addition, we propose a causality-guided training strategy for assisting PPAT in learning causality-related representations on every layer. Extensive experiments show that our model achieves state-of-the-art performance on three benchmark datasets (5.5%, 2.2% and 4.5% F1 gains on EventStoryLine, MAVEN-ERE and Causal-TimeBank).
List of keywords
Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Information extraction
930
A Low Latency Adaptive Coding Spike Framework for Deep Reinforcement Learning
Lang Qin, Rui Yan, Huajin Tang
[+] More
[-] Less
In recent years, spiking neural networks (SNNs) have been used in reinforcement learning (RL) due to their low power consumption and event-driven features. However, spiking reinforcement learning (SRL), which suffers from fixed coding methods, still faces the problems of high latency and poor versatility. In this paper, we use learnable matrix multiplication to encode and decode spikes, improving the flexibility of the coders and thus reducing latency. Meanwhile, we train the SNNs using the direct training method and use two different structures for online and offline RL algorithms, which gives our model a wider range of applications. Extensive experiments have revealed that our method achieves optimal performance with ultra-low latency (as low as 0.8% of other SRL methods) and excellent energy efficiency (up to 5X the DNNs) in different algorithms and different environments.
List of keywords
Humans and AI -> HAI: Cognitive modeling
Machine Learning -> ML: Deep reinforcement learning
Robotics -> ROB: Cognitive robotics
952
Sph2Pob: Boosting Object Detection on Spherical Images with Planar Oriented Boxes Methods
Xinyuan Liu, Hang Xu, Bin Chen, Qiang Zhao, Yike Ma, Chenggang Yan, Feng Dai
[+] More
[-] Less
Object detection on panoramic/spherical images has been developed rapidly in the past few years, where IoU-calculator is a fundamental part of various detector components, i.e. Label Assignment, Loss and NMS. Due to the low efficiency and non-differentiability of spherical Unbiased IoU, spherical approximate IoU methods have been proposed recently. We find that the key of these approximate methods is to map spherical boxes to planar boxes. However, there exists two problems in these methods: (1) they do not eliminate the influence of panoramic image distortion; (2) they break the original pose between bounding boxes. They lead to the low accuracy of these methods. Taking the two problems into account, we propose a new sphere-plane boxes transform, called Sph2Pob. Based on the Sph2Pob, we propose (1) an differentiable IoU, Sph2Pob-IoU, for spherical boxes with low time-cost and high accuracy and (2) an agent Loss, Sph2Pob-Loss, for spherical detection with high flexibility and expansibility. Extensive experiments verify the effectiveness and generality of our approaches, and Sph2Pob-IoU and Sph2Pob-Loss together boost the performance of spherical detectors. The source code is available at https://github.com/AntXinyuan/sph2pob.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Scene analysis and understanding   
953
APR: Online Distant Point Cloud Registration through Aggregated Point Cloud Reconstruction
Quan Liu, Yunsong Zhou, Hongzi Zhu, Shan Chang, Minyi Guo
[+] More
[-] Less
For many driving safety applications, it is of great importance to accurately register LiDAR point clouds generated on distant moving vehicles. However, such point clouds have extremely different point density and sensor perspective on the same object, making registration on such point clouds very hard. In this paper, we propose a novel feature extraction framework, called APR, for online distant point cloud registration. Specifically, APR leverages an autoencoder design, where the autoencoder reconstructs a denser aggregated point cloud with several frames instead of the original single input point cloud. Our design forces the encoder to extract features with rich local geometry information based on one single input point cloud. Such features are then used for online distant point cloud registration. We conduct extensive experiments against state-of-the-art (SOTA) feature extractors on KITTI and nuScenes datasets. Results show that APR outperforms all other extractors by a large margin, increasing average registration recall of SOTA extractors by 7.1% on LoKITTI and 4.6% on LoNuScenes. Code is available at https://github.com/liuQuan98/APR.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Machine learning for vision
962
Quick Multi-Robot Motion Planning by Combining Sampling and Search
Keisuke Okumura, Xavier Défago
[+] More
[-] Less
We propose a novel algorithm to solve multi-robot motion planning (MRMP) rapidly, called Simultaneous Sampling-and-Search Planning (SSSP). Conventional MRMP studies mostly take the form of two-phase planning that constructs roadmaps and then finds inter-robot collision-free paths on those roadmaps. In contrast, SSSP simultaneously performs roadmap construction and collision-free pathfinding. This is realized by uniting techniques of single-robot sampling-based motion planning and search techniques of multi-agent pathfinding on discretized spaces. Doing so builds the small search space, leading to quick MRMP. SSSP ensures finding a solution eventually if exists. Our empirical evaluations in various scenarios demonstrate that SSSP significantly outperforms standard approaches to MRMP, i.e., solving more problem instances much faster. We also applied SSSP to planning for 32 ground robots in a dense situation.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: Distributed and multi-agent planning
Robotics -> ROB: Motion and path planning
Robotics -> ROB: Multi-robot systems
974
DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards
Shanchuan Wan, Yujin Tang, Yingtao Tian, Tomoyuki Kaneko
[+] More
[-] Less
Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness crucially decides the performance of RL algorithms, especially when facing sparse extrinsic rewards. Recent studies showed the effectiveness of encouraging exploration with intrinsic rewards estimated from novelty in observations. However, there is a gap between the novelty of an observation and an exploration in general, because the stochasticity in the environment as well as the behavior of an agent may affect the observation. To estimate exploratory behaviors accurately, we propose DEIR, a novel method where we theoretically derive an intrinsic reward from a conditional mutual information term that principally scales with the novelty contributed by agent explorations, and materialize the reward with a discriminative forward model. We conduct extensive experiments in both standard and hardened exploration games in MiniGrid to show that DEIR quickly learns a better policy than baselines. Our evaluations in ProcGen demonstrate both generalization capabilities and the general applicability of our intrinsic reward.
List of keywords
Machine Learning -> ML: Reinforcement learning
978
Helpful Information Sharing for Partially Informed Planning Agents
Sarah Keren, David Wies, Sara Bernardini
[+] More
[-] Less
In many real world settings, an autonomous agent may not have sufficient information and sensory capabilities to accomplish its goals, even when they are achievable. In some cases, the needed information can be provided by another agent, but information sharing might be costly due to limited communication bandwidth and other constraints. We address the problem of Helpful Information Sharing (HIS), which focuses on selecting minimal information to reveal to the partially informed agent (or actor) in order to guarantee it can achieve its goal. As the space of possible information items to share may be large, it is crucial to devise efficient methods to identify optimal interventions that represent the sharing of only information that is critical for task completion and that cannot be acquired by the agent on its own. For this purpose, we offer a novel compilation of HIS to a classical planning problem that can be solved efficiently by any off-the-shelf solver. We provide guarantees of optimality for our approach and describe its extensions to support maximizing robustness and to settings in which the agent needs to decide which sensors to deploy in the environment. We demonstrate the efficiency of our approaches on a set of standard benchmarks as well as on a novel benchmark of an Escape Room.
List of keywords
Planning and Scheduling -> PS: Planning with Incomplete Information
Planning and Scheduling -> PS: Model-based reasoning
981
Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation
Shaofei Huang, Han Li, Yuqing Wang, Hongji Zhu, Jiao Dai, Jizhong Han, Wenge Rong, Si Liu
[+] More
[-] Less
Audio visual segmentation (AVS) aims to segment the sounding objects for each frame of a given video. To distinguish the sounding objects from silent ones, both audio-visual semantic correspondence and temporal interaction are required. The previous method applies multi-frame cross-modal attention to conduct pixel-level interactions between audio features and visual features of multiple frames simultaneously, which is both redundant and implicit. In this paper, we propose an Audio-Queried Transformer architecture, AQFormer, where we define a set of object queries conditioned on audio information and associate each of them to particular sounding objects. Explicit object-level semantic correspondence between audio and visual modalities is established by gathering object information from visual features with predefined audio queries. Besides, an Audio-Bridged Temporal Interaction module is proposed to exchange sounding object-relevant information among multiple frames with the bridge of audio features. Extensive experiments are conducted on two AVS benchmarks to show that our method achieves state-of-the-art performances, especially 7.1% M_J and 7.6% M_F gains on the MS3 setting.
List of keywords
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Video analysis and understanding   
990
ProMix: Combating Label Noise via Maximizing Clean Sample Utility
Ruixuan Xiao, Yiwen Dong, Haobo Wang, Lei Feng, Runze Wu, Gang Chen, Junbo Zhao
[+] More
[-] Less
Learning with Noisy Labels (LNL) has become an appealing topic, as imperfectly annotated data are relatively cheaper to obtain. Recent state-of-the-art approaches employ specific selection mechanisms to separate clean and noisy samples and then apply Semi-Supervised Learning (SSL) techniques for improved performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. To fulfill this, we propose a novel LNL framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high confidence selection technique that selects those examples with high confidence scores and matched predictions with given labels to dynamically expand a base clean sample set. To overcome the potential side effect of excessive clean set selection procedure, we further devise a novel SSL framework that is able to train balanced and unbiased classifiers on the separated clean and noisy samples. Extensive experiments demonstrate that ProMix significantly advances the current state-of-the-art results on multiple benchmarks with different types and levels of noise. It achieves an average improvement of 2.48% on the CIFAR-N dataset.
List of keywords
Machine Learning -> ML: Weakly supervised learning
997
Outsourcing Adjudication to Strategic Jurors
Ioannis Caragiannis, Nikolaj Schwartzbach
[+] More
[-] Less
We study a scenario where an adjudication task (e.g., the resolution of a binary dispute) is outsourced to a set of agents who are appointed as jurors. This scenario is particularly relevant in a Web3 environment, where no verification of the adjudication outcome is possible, and the appointed agents are, in principle, indifferent to the final verdict. We consider simple adjudication mechanisms that use (1) majority voting to decide the final verdict and (2) a payment function to reward the agents with the majority vote and possibly punish the ones in the minority. Agents interact with such a mechanism strategically: they exert some effort to understand how to properly judge the dispute and cast a yes/no vote that depends on this understanding and on information they have about the rest of the votes. Eventually, they vote so that their utility (i.e., their payment from the mechanism minus the cost due to their effort) is maximized. Under reasonable assumptions about how an agent’s effort is related to her understanding of the dispute, we show that appropriate payment functions can be used to recover the correct adjudication outcome with high probability. Our findings follow from a detailed analysis of the induced strategic game and make use of both theoretical arguments and simulation experiments.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Computational social choice
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
1003
New Fairness Concepts for Allocating Indivisible Items
Ioannis Caragiannis, Jugal Garg, Nidhi Rathi, Eklavya Sharma, Giovanna Varricchio
[+] More
[-] Less
For the fundamental problem of \emph{fairly} dividing a set of indivisible items among agents, \emph{envy-freeness up to any item} (EFX) and \emph{maximin fairness} (MMS) are arguably the most compelling fairness concepts proposed till now. Unfortunately, despite significant efforts over the past few years, whether EFX allocations always exist is still an enigmatic open problem, let alone their efficient computation. Furthermore, today we know that MMS allocations are not always guaranteed to exist. These facts weaken the usefulness of both EFX and MMS, albeit their appealing conceptual characteristics. We propose two alternative fairness concepts—called \emph{epistemic EFX} (EEFX) and \emph{minimum EFX value fairness} (MXS)—inspired by EFX and MMS. For both, we explore their relationships to well-studied fairness notions and, more importantly, prove that EEFX and MXS allocations always exist and can be computed efficiently for additive valuations. Our results justify that the new fairness concepts are excellent alternatives to EFX and MMS.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Fair division
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Computational social choice
1009
Dynamic flows on curved space generated by labeled data
Xinru Hua, Truyen Nguyen, Tam Le, Jose Blanchet, Viet Anh Nguyen
[+] More
[-] Less
The scarcity of labeled data is a long-standing challenge for many machine learning tasks. We propose our gradient flow method to leverage the existing dataset (i.e., source) to generate new samples that are close to the dataset of interest (i.e., target). We lift both datasets to the space of probability distributions on the feature-Gaussian manifold, and then develop a gradient flow method that minimizes the maximum mean discrepancy loss. To perform the gradient flow of distributions on the curved feature-Gaussian space, we unravel the Riemannian structure of the space and compute explicitly the Riemannian gradient of the loss function induced by the optimal transport metric. For practical applications, we also propose a discretized flow, and provide conditional results guaranteeing the global convergence of the flow to the optimum. We illustrate the results of our proposed gradient flow method on several real-world datasets and show our method can improve the accuracy of classification models in transfer learning settings.
List of keywords
Machine Learning -> ML: Optimization
Machine Learning -> ML: Multi-task and transfer learning
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Few-shot learning
1021
Fair Division with Two-Sided Preferences
Ayumi Igarashi, Yasushi Kawase, Warut Suksompong, Hanna Sumita
[+] More
[-] Less
We study a fair division setting in which a number of players are to be fairly distributed among a set of teams. In our model, not only do the teams have preferences over the players as in the canonical fair division setting, but the players also have preferences over the teams. We focus on guaranteeing envy-freeness up to one player (EF1) for the teams together with a stability condition for both sides. We show that an allocation satisfying EF1, swap stability, and individual stability always exists and can be computed in polynomial time, even when teams may have positive or negative values for players. Similarly, a balanced and swap stable allocation that satisfies a relaxation of EF1 can be computed efficiently. When teams have nonnegative values for players, we prove that an EF1 and Pareto optimal allocation exists and, if the valuations are binary, can be found in polynomial time. We also examine the compatibility between EF1 and justified envy-freeness.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Fair division
Game Theory and Economic Paradigms -> GTEP: Computational social choice
1022
Towards Long-delayed Sparsity: Learning a Better Transformer through Reward Redistribution
Tianchen Zhu, Yue Qiu, Haoyi Zhou, Jianxin Li
[+] More
[-] Less
Recently, Decision Transformer (DT) pioneered the offline RL into a contextual conditional sequence modeling paradigm, which leverages self-attended autoregression to learn from global target rewards, states, and actions. However, many applications have a severe delay of the above signals, such as the agent can only obtain a reward signal at the end of each trajectory. This delay causes an unwanted bias cumulating in autoregressive learning global signals. In this paper, we focused its virtual example on episodic reinforcement learning with trajectory feedback. We propose a new reward redistribution algorithm for learning parameterized reward functions, and it decomposes the long-delayed reward onto each timestep. To improve the redistributing’s adaptation ability, we formulate the previous decomposition as a bi-level optimization problem for global optimal. We extensively evaluate the proposed method on various benchmarks and demonstrate an overwhelming performance improvement under long-delayed settings.
List of keywords
Machine Learning -> ML: Deep reinforcement learning
Planning and Scheduling -> PS: POMDPs
Uncertainty in AI -> UAI: Sequential decision making
1028
Communication-Efficient Stochastic Gradient Descent Ascent with Momentum Algorithms
Yihan Zhang, Meikang Qiu, Hongchang Gao
[+] More
[-] Less
Numerous machine learning models can be formulated as a stochastic minimax optimization problem, such as imbalanced data classification with AUC maximization. Developing efficient algorithms to optimize such kinds of problems is of importance and necessity. However, most existing algorithms restrict their focus on the single-machine setting so that they are incapable of dealing with the large communication overhead in a distributed training system. Moreover, most existing communication-efficient optimization algorithms only focus on the traditional \textit{minimization} problem, failing to handle the \textit{minimax} optimization problem. To address these challenging issues, in this paper, we develop two novel communication-efficient stochastic gradient descent ascent with momentum algorithms for the distributed minimax optimization problem, which can significantly reduce the communication cost via the two-way compression scheme. However, the compressed \textit{momentum} makes it considerably challenging to investigate the convergence rate of our algorithms, especially in the presence of the interaction between the minimization and maximization subproblems. In this paper, we successfully addressed these challenges and established the convergence rate of our algorithms for nonconvex-strongly-concave problems. To the best of our knowledge, our algorithms are the first communication-efficient algorithm with theoretical guarantees for the \textit{minimax} optimization problem. Finally, we apply our algorithm to the distributed AUC maximization problem for the imbalanced data classification task. Extensive experimental results confirm the efficacy of our algorithm in saving communication cost.
List of keywords
Data Mining -> DM: Parallel, distributed and cloud-based high performance mining
1032
Fine-tuned vs. Prompt-tuned Supervised Representations: Which Better Account for Brain Language Representations?
Jingyuan Sun, Sien Moens
[+] More
[-] Less
To decipher the algorithm underlying the human brain’s language representation, previous work probed brain responses to language input with pre-trained artificial neural network (ANN) models fine-tuned on NLU tasks. However, fine-tuning generally updates the full parametric space and distorts pre-trained features, cognitively inconsistent with the brain’s robust multi-task learning ability. Prompt-tuning, in contrast with fine-tuning, protects pre-trained weights and learns task-specific embeddings to fit a task. Could prompt-tuning generate representations that better account for the brain’s language representations than fine-tuning? If so, what kind of NLU task leads a pre-trained model to better decode the information represented in the human brain? We investigate these questions by comparing prompt-tuned and fine-tuned representations in neural decoding, that is predicting the linguistic stimulus from the brain activities evoked by the stimulus. We find that on none of the 10 NLU tasks, fine-tuning significantly outperforms prompt-tuning in neural decoding, implicating that a more brain-consistent tuning method yields representations better correlating with the brain data. Moreover, we identify that tasks dealing with fine-grained concept meaning yield representations that better decode brain activation patterns than other tasks, especially the syntactic chunking task. This indicates that our brain encodes more fine-grained concept information than shallow syntactic information when representing languages.
List of keywords
Natural Language Processing -> NLP: Embeddings
Humans and AI -> HAI: Brain sciences
Humans and AI -> HAI: Cognitive modeling
1034
Multi-Agent Systems with Quantitative Satisficing Goals
Senthil Rajasekaran, Suguman Bansal, Moshe Vardi
[+] More
[-] Less
In the study of reactive systems, qualitative properties are usually easier to model and analyze than quantitative properties. This is especially true in systems where mutually beneficial cooperation between agents is possible, such as multi-agent systems. The large number of possible payoffs available to agents in reactive systems with quantitative properties means that there are many scenarios in which agents deviate from mutually beneficial outcomes in order to gain negligible payoff improvements. This behavior often leads to less desirable outcomes for all agents involved. For this reason we study satisficing goals, derived from a decision-making approach aimed at meeting a good-enough outcome instead of pure optimization. By considering satisficing goals, we are able to employ efficient automata-based algorithms to find pure-strategy Nash equilibria. We then show that these algorithms extend to scenarios in which agents have multiple thresholds, providing an approximation of optimization while still retaining the possibility of mutually beneficial cooperation and efficient automata-based algorithms. Finally, we demonstrate a one-way correspondence between the existence of epsilon-equilibria and the existence of equilibria in games where agents have multiple thresholds.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
1045
Learning Efficient Truthful Mechanisms for Trading Networks
Takayuki Osogami, Segev Wasserkrug, Elisheva S. Shamash
[+] More
[-] Less
Trading networks are an indispensable part of today’s economy, but to compete successfully with others, they must be efficient in maximizing the value they provide to the external market. While the prior work relies on truthful disclosure of private information to achieve efficiency, we study the problem of designing mechanisms that result in efficient trading networks by incentivizing firms to truthfully reveal their private information to a third party. Additional desirable properties of such mechanisms are weak budget balance (WBB; the third party needs not invest) and individual rationality (IR; firms get non-negative utility). Unlike combinatorial auctions, there may not exist mechanisms that simultaneously satisfy these properties ex post for trading networks. We propose an approach for computing or learning truthful and efficient mechanisms for given networks in a Bayesian setting, where WBB and IR, respectively, are relaxed to ex ante and interim for a given distribution over the private information. We incorporate techniques to reduce computational and sample complexity. We empirically demonstrate that the proposed approach successfully finds the mechanisms with the relaxed properties for trading networks where achieving ex post properties is impossible.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Mechanism design
Multidisciplinary Topics and Applications -> MDA: Economics
1063
Black-box Prompt Tuning for Vision-Language Model as a Service
Lang Yu, Qin Chen, Jiaju Lin, Liang He
[+] More
[-] Less
In the scenario of Model-as-a-Service (MaaS), pre-trained models are usually released as inference APIs. Users are allowed to query those models with manually crafted prompts. Without accessing the network structure and gradient information, it’s tricky to perform continuous prompt tuning on MaaS, especially for vision-language models (VLMs) considering cross-modal interaction. In this paper, we propose a black-box prompt tuning framework for VLMs to learn task-relevant prompts without back-propagation. In particular, the vision and language prompts are jointly optimized in the intrinsic parameter subspace with various evolution strategies. Different prompt variants are also explored to enhance the cross-model interaction. Experimental results show that our proposed black-box prompt tuning framework outperforms both hand-crafted prompt engineering and gradient-based prompt learning methods, which serves as evidence of its capability to train task-relevant prompts in a derivative-free manner.
List of keywords
Computer Vision -> CV: Vision and language 
Machine Learning -> ML: Evolutionary learning
Machine Learning -> ML: Multi-modal learning
1065
An Exact Algorithm for the Minimum Dominating Set Problem
Hua Jiang, Zhifei Zheng
[+] More
[-] Less
The Minimum Dominating Set (MDS) problem is a classic NP-hard combinatorial optimization problem with many practical applications. Solving MDS is extremely challenging in computation. Previous work on exact algorithms mainly focuses on improving the theoretical time complexity and existing practical algorithms for MDS are almost based on heuristic search. In this paper, we propose a novel lower bound and an exact algorithm for MDS. The algorithm implements a branch-and-bound (BnB) approach and employs the new lower bound to reduce search space. Extensive empirical results show that the new lower bound is efficient in reduction of the search space and the new algorithm is effective for the standard instances and real-world instances. To the best of our knowledge, this is the first effective BnB algorithm for MDS.
List of keywords
Search -> S: Combinatorial search and optimisation
Search -> S: Heuristic search
1067
Few-shot Classification via Ensemble Learning with Multi-Order Statistics
Sai Yang, Fan Liu, Delong Chen, Jun Zhou
[+] More
[-] Less
Transfer learning has been widely adopted for few-shot classification. Recent studies reveal that obtaining good generalization representation of images on novel classes is the key to improving the few-shot classification accuracy. To address this need, we prove theoretically that leveraging ensemble learning on the base classes can correspondingly reduce the true error in the novel classes. Following this principle, a novel method named Ensemble Learning with Multi-Order Statistics (ELMOS) is proposed in this paper. In this method, after the backbone network, we use multiple branches to create the individual learners in the ensemble learning, with the goal to reduce the storage cost. We then introduce different order statistics pooling in each branch to increase the diversity of the individual learners. The learners are optimized with supervised losses during the pre-training phase. After pre-training, features from different branches are concatenated for classifier evaluation. Extensive experiments demonstrate that each branch can complement the others and our method can produce a state-of-the-art performance on multiple few-shot classification benchmark datasets.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
1068
A Refined Upper Bound and Inprocessing for the Maximum k-plex Problem
Hua Jiang, Fusheng Xu, Zhifei Zheng, Bowen Wang, Wei Zhou
[+] More
[-] Less
A k-plex of a graph G is an induced subgraph in which every vertex has at most k-1 nonadjacent vertices. The Maximum k-plex Problem (MKP) consists in finding a k-plex of the largest size, which is NP-hard and finds many applications. Existing exact algorithms mainly implement a branch-and-bound approach and improve performance by integrating effective upper bounds and graph reduction rules. In this paper, we propose a refined upper bound, which can derive a tighter upper bound than existing methods, and an inprocessing strategy, which performs graph reduction incrementally. We implement a new BnB algorithm for MKP that employs the two components to reduce the search space. Extensive experiments show that both the refined upper bound and the inprocessing strategy are very efficient in the reduction of search space. The new algorithm outperforms the state-of-the-art algorithms on the tested benchmarks significantly.
List of keywords
Search -> S: Combinatorial search and optimisation
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Search -> S: Heuristic search
1072
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention
Xiangcheng Liu, Tianyi Wu, Guodong Guo
[+] More
[-] Less
Vision transformer has emerged as a new paradigm in computer vision, showing excellent performance while accompanied by expensive computational cost. Image token pruning is one of the main approaches for ViT compression, due to the facts that the complexity is quadratic with respect to the token number, and many tokens containing only background regions do not truly contribute to the final prediction. Existing works either rely on additional modules to score the importance of individual tokens, or implement a fixed ratio pruning strategy for different input instances. In this work, we propose an adaptive sparse token pruning framework with a minimal cost. Specifically, we firstly propose an inexpensive attention head importance weighted class attention scoring mechanism. Then, learnable parameters are inserted as thresholds to distinguish informative tokens from unimportant ones. By comparing token attention scores and thresholds, we can discard useless tokens hierarchically and thus accelerate inference. The learnable thresholds are optimized in budget-aware training to balance accuracy and complexity, performing the corresponding pruning configurations for different input instances. Extensive experiments demonstrate the effectiveness of our approach. Our method improves the throughput of DeiT-S by 50% and brings only 0.2% drop in top-1 accuracy, which achieves a better trade-off between accuracy and latency than the previous methods.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Machine Learning -> ML: Attention models
1078
Hierarchical State Abstraction based on Structural Information Principles
Xianghua Zeng, Hao Peng, Angsheng Li, Chunyang Liu, Lifang He, Philip S. Yu
[+] More
[-] Less
State abstraction optimizes decision-making by ignoring irrelevant environmental information in reinforcement learning with rich observations. Nevertheless, recent approaches focus on adequate representational capacities resulting in essential information loss, affecting their performances on challenging tasks. In this article, we propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective. Specifically, an unsupervised, adaptive hierarchical state clustering method without requiring manual assistance is presented, and meanwhile, an optimal encoding tree is generated. On each non-root tree node, a new aggregation function and condition structural entropy are designed to achieve hierarchical state abstraction and compensate for sampling-induced essential information loss in state abstraction. Empirical evaluations on a visual gridworld domain and six continuous control benchmarks demonstrate that, compared with five SOTA state abstraction approaches, SISA significantly improves mean episode reward and sample efficiency up to 18.98 and 44.44%, respectively. Besides, we experimentally show that SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.
List of keywords
Machine Learning -> ML: Reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Applications
Machine Learning -> ML: Deep reinforcement learning
1099
LGPConv: Learnable Gaussian Perturbation Convolution for Lightweight Pansharpening
Chen-Yu Zhao, Tian-Jing Zhang, Ran Ran, Zhi-Xuan Chen, Liang-Jian Deng
[+] More
[-] Less
Pansharpening is a critical yet challenging low-level vision task that aims to obtain a high spatial resolution image by fusing a multispectral (MS) image and a panchromatic (PAN) image. While currently used pansharpening methods are based on convolutional neural networks (CNNs) with standard convolution operation, we observe a strong correlation among the channel dimension of the standard convolution kernel, resulting in a significant computational burden and a large amount of redundancy in the pansharpening neural network. In this work, we propose a novel Learnable Gaussian Perturbation Convolution (LGPConv) capable of replacing and surpassing the standard convolution. With theoretical analysis of the given approach, LGPConv simultaneously exploits two specific properties of standard convolution kernels: 1) correlations within channels: we only learn one premier kernel as a base for further expansion, significantly reducing the parameters and avoiding the difficulty of training caused by redundancy; 2) randomness within channels: we simulate randomness and differences among channels by applying perturbations with Gaussian noise, effectively realizing kernel expansion, which enhances the ability of its nonlinear representation. We demonstrate this new technical contribution to a well-designed LGPConv-based pansharpening network. Extensive experiments reveal that our method achieves the state-of-the-art with a minimal number of parameters, to the best of our knowledge.
List of keywords
Machine Learning -> ML: Convolutional networks
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Applications
1100
Improving Heterogeneous Model Reuse by Density Estimation
Anke Tang, Yong Luo, Han Hu, Fengxiang He, kehua Su, Bo Du, Yixin Chen, Dacheng Tao
[+] More
[-] Less
In this paper, we study the problem of multiparty learning, which aims to learn a model using private data from different participants. Model reuse is a promising solution for multiparty learning by assuming that a local model has been trained for each party. Considering the potential sample selection bias among different parties, some heterogeneous model reuse approaches are developed. However, although the pre-trained local models are utilized in these approaches, the characteristics of the local data are not well exploited. This motivates us to estimate the density of local data and design an auxiliary model together with the local classifier for reuse. When some local models are not well pre-trained, we further design a multiparty cross-entropy loss for calibration. Unlike existing approaches, we address the heterogeneous model reuse problem from a decision theory perspective and take advantage of recent advances in density estimation. Experimental results on both synthetic and benchmark data demonstrate the superiority of the proposed method.
List of keywords
Machine Learning -> ML: Multi-task and transfer learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
Machine Learning -> ML: Classification
1111
Violin: Virtual Overbridge Linking for Enhancing Semi-supervised Learning on Graphs with Limited Labels
Siyue Xie, Da Sun Handason Tam, Wing Cheong Lau
[+] More
[-] Less
Graph Neural Networks (GNNs) is a family of promising tools for graph semi-supervised learning. However, in training, most existing GNNs rely heavily on a large amount of labeled data, which is rare in real-world scenarios. Unlabeled data with useful information are usually under-exploited, which limits the representation power of GNNs. To handle these problems, we propose Virtual Overbridge Linking (Violin), a generic framework to enhance the learning capacity of common GNNs. By learning to add virtual overbridges between two nodes that are estimated to be semantic-consistent, labeled and unlabeled data can be correlated. Supervised information can be well utilized in training while simultaneously inducing the model to learn from unlabeled data. Discriminative relation patterns extracted from unlabeled nodes can also be shared with other nodes even if they are remote from each other. Motivated by recent advances in data augmentations, we additionally integrate Violin with the consistency regularized training. Such a scheme yields node representations with better robustness, which significantly enhances a GNN. Violin can be readily extended to a wide range of GNNs without introducing additional learnable parameters. Extensive experiments on six datasets demonstrate that our method is effective and robust under low-label rate scenarios, where Violin can boost some GNNs’ performance by over 10% on node classifications.
List of keywords
Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Representation learning
1117
Co-training with High-Confidence Pseudo Labels for Semi-supervised Medical Image Segmentation
Zhiqiang Shen, Peng Cao, Hua Yang, Xiaoli Liu, Jinzhu Yang, Osmar R. Zaiane
[+] More
[-] Less
Consistency regularization and pseudo labeling-based semi-supervised methods perform co-training using the pseudo labels from multi-view inputs. However, such co-training models tend to converge early to a consensus, degenerating to the self-training ones, and produce low-confidence pseudo labels from the perturbed inputs during training. To address these issues, we propose an Uncertainty-guided Collaborative Mean-Teacher (UCMT) for semi-supervised semantic segmentation with the high-confidence pseudo labels. Concretely, UCMT consists of two main components: 1) collaborative mean-teacher (CMT) for encouraging model disagreement and performing co-training between the sub-networks, and 2) uncertainty-guided region mix (UMIX) for manipulating the input images according to the uncertainty maps of CMT and facilitating CMT to produce high-confidence pseudo labels. Combining the strengths of UMIX with CMT, UCMT can retain model disagreement and enhance the quality of pseudo labels for the co-training segmentation. Extensive experiments on four public medical image datasets including 2D and 3D modalities demonstrate the superiority of UCMT over the state-of-the-art. Code is available at: https://github.com/Senyh/UCMT.
List of keywords
Machine Learning -> ML: Semi-supervised learning
Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Segmentation
1132
Imbalanced Node Classification Beyond Homophilic Assumption
Jie Liu, Mengting He, Guangtao Wang, Quoc Viet Hung Nguyen, Xuequn Shang, Hongzhi Yin
[+] More
[-] Less
Imbalanced node classification widely exists in real-world networks where graph neural networks (GNNs) are usually highly inclined to majority classes and suffer from severe performance degradation on classifying minority class nodes. Various imbalanced node classification methods have been proposed recently which construct synthetic nodes and edges w.r.t. minority classes to balance the label/topology distribution. However, they are all based on homophilic assumption that nodes of the same label tend to connect despite the widely existence of heterophilic edges in real-world graphs. Thus, they uniformly aggregate features from both homophilic and heterophilic neighbors and rely on feature similarity to generate synthetic edges, which cannot be applied to imbalanced graphs in high heterophily. To address this problem, we propose a novel GraphSANN for imbalanced node classification on both homophilic and heterophilic graphs. Firstly, we propose a unified feature mixer to generate synthetic nodes with both homophilic and heterophilic interpolation in a unified way. Next, by randomly sampling edges between synthetic nodes and existing nodes as candidata edges, we design an adaptive subgraph extractor to dynamically extract the contextual subgraphs of candidate edges with flexible ranges. Finally, we develop a multi-filter subgraph encoder which constructs multiple different filter channels to discriminatively aggregate neighbors’ information along the homophilic and heterophilic edges. Extensive experiments on eight benchmark datasets demonstrate the superiority of our model for imbalanced node classificaiton on both homophilic and heterophilic graphs.
List of keywords
Data Mining -> DM: Mining graphs
Data Mining -> DM: Class imbalance and unequal cost
Machine Learning -> ML: Learning graphical models
1134
Minimally Supervised Contextual Inference from Human Mobility: An Iterative Collaborative Distillation Framework
Jiayun Zhang, Xinyang Zhang, Dezhi Hong, Rajesh Gupta, Jingbo Shang
[+] More
[-] Less
Inferring the context about trips and users from mobility data is valuable for mobile service providers to understand their customers and improve their services. Existing methods require a large amount of labels for training, which is hard to meet in practice. In this paper, we study a more practical yet challenging setting—contextual inference using mobility data with minimal supervision (i.e., using a few labels per class and massive unlabeled data). A typical solution is to apply semi-supervised methods that follow a self-training framework to bootstrap a model based on all features. However, the minimal labeled set brings a high risk of overfitting to self-training, leading to unsatisfactory performance. We propose a novel collaborative distillation framework STCOLAB. Specifically, it sequentially trains spatial and temporal modules at each iteration following the supervision of demographic labels. In addition, it distills knowledge to the module being trained using the logits produced by the latest trained module of the other modality, thereby combining the knowledge learned by both modalities to mutually calibrate the modules. Extensive experiments on two real-world datasets show STCOLAB achieves significantly more accurate demographic inference than various baselines.
List of keywords
Data Mining -> DM: Mining spatial and/or temporal data
1137
Robust Image Ordinal Regression with Controllable Image Generation
Yi Cheng, Haochao Ying, Renjun Hu, Jinhong Wang, Wenhao Zheng, Xiao Zhang, Danny Chen, Jian Wu
[+] More
[-] Less
Image ordinal regression has been mainly studied along the line of exploiting the order of categories. However, the issues of class imbalance and category overlap that are very common in ordinal regression were largely overlooked. As a result, the performance on minority categories is often unsatisfactory. In this paper, we propose a novel framework called CIG based on controllable image generation to directly tackle these two issues. Our main idea is to generate extra training samples with specific labels near category boundaries, and the sample generation is biased toward the less-represented categories. To achieve controllable image generation, we seek to separate structural and categorical information of images based on structural similarity, categorical similarity, and reconstruction constraints. We evaluate the effectiveness of our new CIG approach in three different image ordinal regression scenarios. The results demonstrate that CIG can be flexibly integrated with off-the-shelf image encoders or ordinal regression models to achieve improvement, and further, the improvement is more significant for minority categories.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
1145
COOL, a Context Outlooker, and its Application to Question Answering and other Natural Language Processing Tasks
Fangyi Zhu, Stéphane Bressan, See Kiong Ng
[+] More
[-] Less
Vision outlooker improves the performance of vision transformers, which implements a self-attention mechanism by adding outlook attention, a form of local attention. In natural language processing, as has been the case in computer vision and other domains, transformer-based models constitute the state-of-the-art for most processing tasks. In this domain, too, many authors have argued and demonstrated the importance of local context. We present an outlook attention mechanism, COOL, for natural language processing. COOL, added on top of the self-attention layers of a transformer-based model, encodes local syntactic context considering word proximity and more pair-wise constraints than dynamic convolution used by existing approaches. A comparative empirical performance evaluation of an implementation of COOL with different transformer-based models confirms the opportunity for improvement over a baseline using the original models alone for various natural language processing tasks, including question answering. The proposed approach achieves competitive performance with existing state-of-the-art methods on some tasks.
List of keywords
Natural Language Processing -> NLP: Question answering
Natural Language Processing -> NLP: Language models
1152
WBFlow: Few-shot White Balance for sRGB Images via Reversible Neural Flows
chunxiao Li, Xuejing Kang, Anlong Ming
[+] More
[-] Less
The white balance methods for sRGB images (sRGB-WB) aim to remove their non-linear color cast without access to raw values. Although the existing sRGB-WB methods have achieved increasingly better white balance (WB) results, their generalization to the sRGB images from multiple cameras is still under-explored. In this paper, we propose an sRGB-WB network named WBFlow, which not only performs superior white balance for sRGB images but also generalizes to multiple cameras well. In detail, we take advantage of neural flow to ensure the reversibility of WBFlow, which allows it to losslessly render color-cast sRGB images back to pseudo-raw features for linear white balancing, thus achieving superior performance. Furthermore, inspired by the inter-camera approach, we design a camera transformation (CT) in the pseudo-raw feature space for generalizing the WBFlow to different cameras via few-shot learning. Given a few sRGB images from an untrained camera, our WBFlow can perform well on this camera by learning the camera-specific parameters of CT from these images. Extensive experiments show that WBFlow achieves state-of-the-art multi-camera generalization and WB accuracy for sRGB images on three public datasets and our rendered multi-camera sRGB dataset.
List of keywords
Computer Vision -> CV: Computational photography
Computer Vision -> CV: Applications
1155
Deep Multi-view Subspace Clustering with Anchor Graph
Chenhang Cui, Yazhou Ren, Jingyu Pu, Xiaorong Pu, Lifang He
[+] More
[-] Less
Deep multi-view subspace clustering (DMVSC) has recently attracted increasing attention due to its promising performance. However, existing DMVSC methods still have two issues: (1) they mainly focus on using autoencoders to nonlinearly embed the data, while the embedding may be suboptimal for clustering because the clustering objective is rarely considered in autoencoders, and (2) existing methods typically have a quadratic or even cubic complexity, which makes it challenging to deal with large-scale data. To address these issues, in this paper we propose a novel deep multi-view subspace clustering method with anchor graph (DMCAG). To be specific, DMCAG firstly learns the embedded features for each view independently, which are used to obtain the subspace representations. To significantly reduce the complexity, we construct an anchor graph with small size for each view. Then, spectral clustering is performed on an integrated anchor graph to obtain pseudo-labels. To overcome the negative impact caused by suboptimal embedded features, we use pseudo-labels to refine the embedding process to make it more suitable for the clustering task. Pseudo-labels and embedded features are updated alternately. Furthermore, we design a strategy to keep the consistency of the labels based on contrastive learning to enhance the clustering performance. Empirical studies on real-world datasets show that our method achieves superior clustering performance over other state-of-the-art methods.
List of keywords
Machine Learning -> ML: Clustering
Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Self-supervised Learning
1180
Faster Exact MPE and Constrained Optimization with Deterministic Finite State Automata
Filippo Bistaffa
[+] More
[-] Less
We propose a concise function representation based on deterministic finite state automata for exact most probable explanation and constrained optimization tasks in graphical models. We then exploit our concise representation within Bucket Elimination (BE). We denote our version of BE as FABE. FABE significantly improves the performance of BE in terms of runtime and memory requirements by minimizing redundancy. Indeed, results on most probable explanation and weighted constraint satisfaction benchmarks show that FABE often outperforms the state of the art, leading to significant runtime improvements (up to 2 orders of magnitude in our tests).
List of keywords
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Uncertainty in AI -> UAI: Graphical models
1194
Action Recognition with Multi-stream Motion Modeling and Mutual Information Maximization
Yuheng Yang, Haipeng Chen, Zhenguang Liu, Yingda Lyu, Beibei Zhang, Shuang Wu, Zhibo Wang, Kui Ren
[+] More
[-] Less
Action recognition has long been a fundamental and intriguing problem in artificial intelligence. The task is challenging due to the high dimensionality nature of an action, as well as the subtle motion details to be considered. Current state-of-the-art approaches typically learn from articulated motion sequences in the straightforward 3D Euclidean space. However, the vanilla Euclidean space is not efficient for modeling important motion characteristics such as the joint-wise angular acceleration, which reveals the driving force behind the motion. Moreover, current methods typically attend to each channel equally and lack theoretical constrains on extracting task-relevant features from the input. In this paper, we seek to tackle these challenges from three aspects: (1) We propose to incorporate an acceleration representation, explicitly modeling the higher-order variations in motion. (2) We introduce a novel Stream-GCN network equipped with multi-stream components and channel attention, where different representations (i.e., streams) supplement each other towards a more precise action recognition while attention capitalizes on those important channels. (3) We explore feature-level supervision for maximizing the extraction of task-relevant information and formulate this into a mutual information loss. Empirically, our approach sets the new state-of-the-art performance on three benchmark datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Action and behavior recognition
1201
Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential Decision-Making in Multi-Agent Reinforcement Learning
Bin Zhang, Lijuan Li, Zhiwei Xu, Dapeng Li, Guoliang Fan
[+] More
[-] Less
In multi-agent reinforcement learning (MARL), self-interested agents attempt to establish equilibrium and achieve coordination depending on game structure. However, existing MARL approaches are mostly bound by the simultaneous actions of all agents in the Markov game (MG) framework, and few works consider the formation of equilibrium strategies via asynchronous action coordination. In view of the advantages of Stackelberg equilibrium (SE) over Nash equilibrium, we construct a spatio-temporal sequential decision-making structure derived from the MG and propose an N-level policy model based on a conditional hypernetwork shared by all agents. This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents. Agents can learn heterogeneous SE policies while still maintaining parameter sharing, which leads to reduced cost for learning and storage and enhanced scalability as the number of agents increases. Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios, and performs admirably in immensely complex settings including cooperative tasks and mixed tasks.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Machine Learning -> ML: Reinforcement learning
1213
Enhancing Datalog Reasoning with Hypertree Decompositions
Xinyue Zhang, Pan Hu, Yavor Nenov, Ian Horrocks
[+] More
[-] Less
Datalog reasoning based on the seminaive evaluation strategy evaluates rules using traditional join plans, which often leads to redundancy and inefficiency in practice, especially when the rules are complex. Hypertree decompositions help identify efficient query plans and reduce similar redundancy in query answering. However, it is unclear how this can be applied to materialisation and incremental reasoning with recursive Datalog programs. Moreover, hypertree decompositions require additional data structures and thus introduce nonnegligible overhead in both runtime and memory consumption. In this paper, we provide algorithms that exploit hypertree decompositions for the materialisation and incremental evaluation of Datalog programs. Furthermore, we combine this approach with standard Datalog reasoning algorithms in a modular fashion so that the overhead caused by the decompositions is reduced. Our empirical evaluation shows that, when the program contains complex rules, the combined approach is usually significantly faster than the baseline approach, sometimes by orders of magnitude.
List of keywords
Knowledge Representation and Reasoning -> KRR: Logic programming
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Semantic Web
1219
Federated Graph Semantic and Structural Learning
Wenke Huang, Guancheng Wan, Mang Ye, Bo Du
[+] More
[-] Less
Federated graph learning collaboratively learns a global graph neural network with distributed graphs, where the non-independent and identically distributed property is one of the major challenge. Most relative arts focus on traditional distributed tasks like images and voices, incapable of the graph structures. This paper firstly reveals that local client distortion is brought by both node-level semantics and graph-level structure. First, for node-level semantic, we find that contrasting nodes from distinct classes is beneficial to provide a well-performing discrimination. We pull the local node towards the global node of the same class and push them away from the global node of different classes. Second, we postulate that a well-structural graph neural network possesses similarity for neighbors due to the inherent adjacency relationships. However, aligning each node with adjacent nodes hinders discrimination due to the potential class inconsistency. We transform the adjacency relationships into the similarity distribution and leverage the global model to distill the relation knowledge into the local model, which preserves the structural information and discriminability of the local model. Empirical results on three graph datasets manifest the superiority of the proposed method over counterparts.
List of keywords
Machine Learning -> ML: Federated learning
Machine Learning -> ML: Sequence and graph learning
1220
GPLight: Grouped Multi-agent Reinforcement Learning for Large-scale Traffic Signal Control
Yilin Liu, Guiyang Luo, Quan Yuan, Jinglin Li, Lei Jin, Bo Chen, Rui Pan
[+] More
[-] Less
Multi-agent reinforcement learning (MARL) method is enjoying popularity and prosperity in coordinating traffic lights (CTL), by treating each intersection as an agent. However, existing MARL approaches either treat each agent absolutely homogeneous, i.e., same network and parameter for each agent, or treat each agent completely heterogeneous, i.e., different networks and parameters for each agent. This leads to a difficult balance between accuracy and complexity, especially in large-scale CTL. To address this challenge, we propose a grouped MARL method named GPLight. We first mine the similarity between agent environment considering both real-time traffic flow and static fine-grained road topology. Then we propose two loss functions for maintaining a learnable and dynamical clustering, one applies mutual information estimation for better stability, the other aims to maximize the separability between groups. Finally, GPLight enforces the agents in a group share the same network and parameter. In this way, the cooperation between the same group of agents reduces the complexity, while different groups reflect the difference of the agents to ensure the accuracy. To verify the effectiveness of our method, we conducted experiments on both synthetic and real-world datasets, with up to 1,000 intersections. Compared with state-of-the-art methods, experiment results demonstrate the superiority of our proposed method, especially in large-scale CTL.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Applications
Machine Learning -> ML: Deep reinforcement learning
1223
Augmenting Automated Spectrum Based Fault Localization For Multiple Faults
Prantik Chatterjee, Jose Campos, Rui Abreu, Subhajit Roy
[+] More
[-] Less
Spectrum-based Fault Localization (SBFL) uses the coverage of test cases and their outcome (pass/fail) to predict the "suspiciousness” of program components, e.g., lines of code. SBFL is, perhaps, the most successful fault localization technique due to its simplicity and scalability. However, SBFL heuristics do not perform well in scenarios where a program may have multiple faulty components. In this work, we propose a new algorithm that "augments” previously proposed SBFL heuristics to produce a ranked list where faulty components ranked low by base SBFL metrics are ranked significantly higher. We implement our ideas in a tool, ARTEMIS, that attempts to "bubble up” faulty components which are ranked lower by base SBFL metrics. We compare our technique to the most popular SBFL metrics and demonstrate statistically significant improvement in the developer effort for fault localization with respect to the basic strategies.
List of keywords
Knowledge Representation and Reasoning -> KRR: Diagnosis and abductive reasoning
Multidisciplinary Topics and Applications -> MDA: Software engineering
1224
From Generation to Suppression: Towards Effective Irregular Glow Removal for Nighttime Visibility Enhancement
Wanyu Wu, Wei Wang, Zheng Wang, Kui Jiang, Xin Xu
[+] More
[-] Less
Most existing Low-Light Image Enhancement (LLIE) methods are primarily designed to improve brightness in dark regions, which suffer from severe degradation in nighttime images. However, these methods have limited exploration in another major visibility damage, the glow effects in real night scenes. Glow effects are inevitable in the presence of artificial light sources and cause further diffused blurring when directly enhanced. To settle this issue, we innovatively consider the glow suppression task as learning physical glow generation via multiple scattering estimation according to the Atmospheric Point Spread Function (APSF). In response to the challenges posed by uneven glow intensity and varying source shapes, an APSF-based Nighttime Imaging Model with Near-field Light Sources (NIM-NLS) is specifically derived to design a scalable Light-aware Blind Deconvolution Network (LBDN). The glow-suppressed result is then brightened via a Retinex-based Enhancement Module (REM). Remarkably, the proposed glow suppression method is based on zero-shot learning and does not rely on any paired or unpaired training data. Empirical evaluations demonstrate the effectiveness of the proposed method in both glow suppression and low-light enhancement tasks.
List of keywords
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Computational photography
Computer Vision -> CV: Segmentation
1231
Rainbow Cycle Number and EFX Allocations: (Almost) Closing the Gap
Shayan Chashm Jahan, Masoud Seddighin, Seyed Mohammad Seyed Javadi, Mohammad Sharifi
[+] More
[-] Less
Recently, some studies on the fair allocation of indivisible goods notice a connection between a purely combinatorial problem called the Rainbow Cycle problem and a fairness notion known as $\efx$: assuming that the rainbow cycle number for parameter $d$ (i.e. $\rainbow(d)$) is $O(d^\beta \log^\gamma d)$, we can find a $(1-\epsilon)$-$\efx$ allocation with $O_{\epsilon}(n^{\frac{\beta}{\beta+1}}\log^{\frac{\gamma}{\beta +1}} n)$ number of discarded goods \cite{chaudhury2021improving}. The best upper bound on $\rainbow(d)$ is improved in a series of works to $O(d^4)$ \cite{chaudhury2021improving}, $O(d^{2+o(1)})$ \cite{berendsohn2022fixed}, and finally to $O(d^2)$ \cite{Akrami2022}.\footnote{We refer to the footnote at the end of the introduction for a short note on the result of \cite{Akrami2022}.} Also, via a simple observation, we have $\rainbow(d) \in \Omega(d)$ \cite{chaudhury2021improving}. In this paper, we introduce another problem in extremal combinatorics. For a parameter $\ell$, we define the rainbow path degree and denote it by $\ech(\ell)$. We show that any lower bound on $\ech(\ell)$ yields an upper bound on $\rainbow(d)$. Next, we prove that $\ech(\ell) \in \Omega(\ell^2/\log n)$ which yields an almost tight upper bound of $\rainbow(d) \in \Omega(d \log d)$. This in turn proves the existence of $(1-\epsilon)$-$\efx$ allocation with $O_{\epsilon}(\sqrt{n \log n})$ number of discarded goods. In addition, for the special case of the Rainbow Cycle problem that the edges in each part form a permutation, we improve the upper bound to $\rainbow(d) \leq 2d-4$. We leverage $\ech(\ell)$ to achieve this bound. Our conjecture is that the exact value of $\ech(\ell) $ is $ \lfloor \frac{\ell^2}{2} \rfloor -1$. We provide some experiments that support this conjecture. Assuming this conjecture is correct, we have $\rainbow(d) \in \Theta(d)$.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Fair division
Multidisciplinary Topics and Applications -> MDA: Economics
1235
Independent Feature Decomposition and Instance Alignment for Unsupervised Domain Adaptation
Qichen He, Siying Xiao, Mao Ye, Xiatian Zhu, Ferrante Neri, Dongde Hou
[+] More
[-] Less
Existing Unsupervised Domain Adaptation (UDA) methods typically attempt to perform knowledge transfer in a domain-invariant space explicitly or implicitly. In practice, however, the obtained features is often mixed with domain-specific information which causes performance degradation. To overcome this fundamental limitation, this article presents a novel independent feature decomposition and instance alignment method (IndUDA in short). Specifically, based on an invertible flow, we project the base features into a decomposed latent space with domain-invariant and domain-specific dimensions. To drive semantic decomposition independently, we then swap the domain-invariant part across source and target domain samples with the same category and require their inverted features are consistent in class-level with the original features. By treating domain-specific information as noise, we replace it by Gaussian noise and further regularize source model training by instance alignment, i.e., requiring the base features close to the corresponding reconstructed features, respectively. Extensive experiment results demonstrate that our method achieves state-of-the-art performance on popular UDA benchmarks. The appendix and code are available at https://github.com/ayombeach/IndUDA.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
1236
Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees
Daqian Shao, Marta Kwiatkowska
[+] More
[-] Less
Linear Temporal Logic (LTL) is widely used to specify high-level objectives for system policies, and it is highly desirable for autonomous systems to learn the optimal policy with respect to such specifications. However, learning the optimal policy from LTL specifications is not trivial. We present a model-free Reinforcement Learning (RL) approach that efficiently learns an optimal policy for an unknown stochastic system, modelled using Markov Decision Processes (MDPs). We propose a novel and more general product MDP, reward structure and discounting mechanism that, when applied in conjunction with off-the-shelf model-free RL algorithms, efficiently learn the optimal policy that maximizes the probability of satisfying a given LTL specification with optimality guarantees. We also provide improved theoretical results on choosing the key parameters in RL to ensure optimality. To directly evaluate the learned policy, we adopt probabilistic model checker PRISM to compute the probability of the policy satisfying such specifications. Several experiments on various tabular MDP environments across different LTL tasks demonstrate the improved sample efficiency and optimal policy convergence.
List of keywords
Machine Learning -> ML: Reinforcement learning
Planning and Scheduling -> PS: Markov decisions processes
Robotics -> ROB: Learning in robotics
1238
c-TPE: Tree-structured Parzen Estimator with Inequality Constraints for Expensive Hyperparameter Optimization
Shuhei Watanabe, Frank Hutter
[+] More
[-] Less
Hyperparameter optimization (HPO) is crucial for strong performance of deep learning algorithms and real-world applications often impose some constraints, such as memory usage, or latency on top of the performance requirement. In this work, we propose constrained TPE (c-TPE), an extension of the widely-used versatile Bayesian optimization method, tree-structured Parzen estimator (TPE), to handle these constraints. Our proposed extension goes beyond a simple combination of an existing acquisition function and the original TPE, and instead includes modifications that address issues that cause poor performance. We thoroughly analyze these modifications both empirically and theoretically, providing insights into how they effectively overcome these challenges. In the experiments, we demonstrate that c-TPE exhibits the best average rank performance among existing methods with statistical significance on 81 expensive HPO with inequality constraints. Due to the lack of baselines, we only discuss the applicability of our method to hard-constrained optimization in Appendix D. See https://arxiv.org/abs/2211.14411 for the latest version with Appendix.
List of keywords
Machine Learning -> ML: Hyperparameter optimization
Machine Learning -> ML: Automated machine learning
1239
Speeding Up Multi-Objective Hyperparameter Optimization by Task Similarity-Based Meta-Learning for the Tree-Structured Parzen Estimator
Shuhei Watanabe, Noor Awad, Masaki Onishi, Frank Hutter
[+] More
[-] Less
Hyperparameter optimization (HPO) is a vital step in improving performance in deep learning (DL). Practitioners are often faced with the trade-off between multiple criteria, such as accuracy and latency. Given the high computational needs of DL and the growing demand for efficient HPO, the acceleration of multi-objective (MO) optimization becomes ever more important. Despite the significant body of work on meta-learning for HPO, existing methods are inapplicable to MO tree-structured Parzen estimator (MO-TPE), a simple yet powerful MO-HPO algorithm. In this paper, we extend TPE’s acquisition function to the meta-learning setting using a task similarity defined by the overlap of top domains between tasks. We also theoretically analyze and address the limitations of our task similarity. In the experiments, we demonstrate that our method speeds up MO-TPE on tabular HPO benchmarks and attains state-of-the-art performance. Our method was also validated externally by winning the AutoML 2022 competition on “Multiobjective Hyperparameter Optimization for Transformers”. See https://arxiv.org/abs/2212.06751 for the latest version.
List of keywords
Machine Learning -> ML: Hyperparameter optimization
Machine Learning -> ML: Automated machine learning
Machine Learning -> ML: Meta-learning
1241
PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces
Shuhei Watanabe, Archit Bansal, Frank Hutter
[+] More
[-] Less
The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this issue, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form calculation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient. See https://arxiv.org/abs/2304.10255 for the latest version.
List of keywords
Machine Learning -> ML: Hyperparameter optimization
Machine Learning -> ML: Automated machine learning
1242
Null-Space Diffusion Sampling for Zero-Shot Point Cloud Completion
Xinhua Cheng, Nan Zhang, Jiwen Yu, Yinhuai Wang, Ge Li, Jian Zhang
[+] More
[-] Less
Point cloud completion aims at estimating the complete data of objects from degraded observations. Despite existing completion methods achieving impressive performances, they rely heavily on degraded-complete data pairs for supervision. In this work, we propose a novel framework named Null-Space Diffusion Sampling (NSDS) to solve the point cloud completion task in a zero-shot manner. By leveraging a pre-trained point cloud diffusion model as the off-the-shelf generator, our sampling approach can generate desired completion outputs with the guidance of the observed degraded data without any extra training. Furthermore, we propose a tolerant loop mechanism to improve the quality of completion results for hard cases. Experimental results demonstrate our zero-shot framework achieves superior completion performance than unsupervised methods and competitive performance to supervised methods in various degraded situations.
List of keywords
Computer Vision -> CV: 3D computer vision
1250
Voice Guard: Protecting Voice Privacy with Strong and Imperceptible Adversarial Perturbation in the Time Domain
Jingyang Li, Dengpan Ye, Long Tang, Chuanxi Chen, Shengshan Hu
[+] More
[-] Less
Adversarial example is a rising tool for voice privacy protection. By adding imperceptible noise to public audio, it prevents tampers from using zero-shot Voice Conversion (VC) to synthesize high quality speech with target speaker identity. However, many existing studies ignore the human perception characteristics of audio data, and it is challenging to generate strong and imperceptible adversarial audio. In this paper, we propose the Voice Guard defense method, which uses a novel method to advance the adversarial perturbation to the time domain to avoid the loss caused by cross-domain conversion. And the psychoacoustic model is introduced into the defense of VC for the first time, which greatly improves the disruption ability and concealment of adversarial audio. We also standardize the evaluation metrics of adversarial audio for the first time, combining multi-dimensional metrics to define the criteria for defense. We evaluate Voice Guard on several state-of-the-art zero-shot VC models. The experimental results show that our method can ensure the perceptual quality of adversarial audio while having a strong defense capability, and is far superior to previous works in terms of disruption ability and concealment.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Security and privacy
Natural Language Processing -> NLP: Speech
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
1251
Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding
Tianrui Hui, Zihan Ding, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Jiao Dai, Jizhong Han, Si Liu
[+] More
[-] Less
Panoptic narrative grounding (PNG) aims to segment things and stuff objects in an image described by noun phrases of a narrative caption. As a multimodal task, an essential aspect of PNG is the visual-linguistic interaction between image and caption. The previous two-stage method aggregates visual contexts from offline-generated mask proposals to phrase features, which tend to be noisy and fragmentary. The recent one-stage method aggregates only pixel contexts from image features to phrase features, which may incur semantic misalignment due to lacking object priors. To realize more comprehensive visual-linguistic interaction, we propose to enrich phrases with coupled pixel and object contexts by designing a Phrase-Pixel-Object Transformer Decoder (PPO-TD), where both fine-grained part details and coarse-grained entity clues are aggregated to phrase features. In addition, we also propose a Phrase-Object Contrastive Loss (POCL) to pull closer the matched phrase-object pairs and push away unmatched ones for aggregating more precise object contexts from more phrase-relevant object tokens. Extensive experiments on the PNG benchmark show our method achieves new state-of-the-art performance with large margins.
List of keywords
Computer Vision -> CV: Vision and language 
Computer Vision -> CV: Segmentation
1252
Latent Processes Identification From Multi-View Time Series
Zenan Huang, Haobo Wang, Junbo Zhao, Nenggan Zheng
[+] More
[-] Less
Understanding the dynamics of time-series data typically requires identifying the unique latent factors for data generation, a.k.a, latent processes identification. Driven by the independent assumption, existing works have made great progress in handling single-view data. However, it is a non-trivial problem that extends them to multi-view time-series data because of two main challenges: (i) the complex data structure, such as temporal dependency, can result in violation of the independent assumption; (ii) the factors from different views are generally overlapped and are hard to be aggregated to a complete set. In this work, we propose a novel framework MuLTI that employs the contrastive learning technique to invert the data generative process for enhanced identifiability. Additionally, MuLTI integrates a permutation mechanism that merges corresponding overlapped variables by the establishment of an optimal transport formula. Extensive experimental results on synthetic and real-world datasets demonstrate the superiority of our method in recovering identifiable latent variables on multi-view time series.
List of keywords
Machine Learning -> ML: Causality
Machine Learning -> ML: Multi-view learning
1265
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering
Abhirama Subramanyam Penamakuri, Manish Gupta, Mithun Gupta, Anand Mishra
[+] More
[-] Less
We study visual question answering in a setting where the answer has to be mined from a pool of relevant and irrelevant images given as a context. For such a setting, a model must first retrieve relevant images from the pool and answer the question from these retrieved images. We refer to this problem as retrieval-based visual question answering (or RETVQA in short). The RETVQA is distinctively different and more challenging than the traditionally studied Visual Question Answering (VQA), where a given question has to be answered with a single relevant image in context. Towards solving the RETVQA task, we propose a unified Multi Image BART (MI-BART) that takes a question and retrieved images using our relevance encoder for free-form fluent answer generation. Further, we introduce the largest dataset in this space, namely RETVQA, which has the following salient features: multi-image and retrieval requirement for VQA, metadata-independent questions over a pool of heterogeneous images, expecting a mix of classification-oriented and open-ended generative answers. Our proposed framework achieves an accuracy of 76.5% and a fluency of 79.3% on the proposed dataset RETVQA and also outperforms state-of-the-art methods by 4.9% and 11.8% on the image segment of the publicly available WebQA dataset on the accuracy and fluency metrics, respectively.
List of keywords
Computer Vision -> CV: Vision and language 
Computer Vision -> CV: Applications
Machine Learning -> ML: Multi-modal learning
1268
Towards Semantics- and Domain-Aware Adversarial Attacks
Jianping Zhang, Yung-chieh Huang, Weibin Wu, Michael Lyu
[+] More
[-] Less
Language models are known to be vulnerable to textual adversarial attacks, which add human-imperceptible perturbations to the input to mislead DNNs. It is thus imperative to devise effective attack algorithms to identify the deficiencies of DNNs before real-world deployment. However, existing word-level attacks have two major deficiencies: (1) They may change the semantics of the original sentence. (2) The generated adversarial sample can appear unnatural to humans due to the introduction of out-of-domain substitute words. In this paper, to address such drawbacks, we propose a semantics- and domain-aware word-level attack method. Specifically, we greedily replace the important words in a sentence with the ones suggested by a language model. The language model is trained to be semantics- and domain-aware via contrastive learning and in-domain pre-training. Furthermore, to balance the quality of adversarial examples and the attack success rate, we propose an iterative updating framework to optimize the contrastive learning loss and the in-domain pre-training loss in circular order. Comprehensive experimental comparisons confirm the superiority of our approach. Notably, compared with state-of-the-art benchmarks, our strategy can achieve over 3\% improvement in attack success rates and 9.8\% improvement in the quality of adversarial examples.
List of keywords
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
1272
Scaling Goal-based Exploration via Pruning Proto-goals
Akhil Bagaria, Tom Schaul
[+] More
[-] Less
One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short. Goal-directed, purposeful behaviours are able to overcome this, but rely on a good goal space. The core challenge in goal discovery is finding the right balance between generality (not hand-crafted) and tractability (useful, not too many). Our approach explicitly seeks the middle ground, enabling the human designer to specify a vast but meaningful proto-goal space, and an autonomous discovery process to refine this to a narrower space of controllable, reachable, novel, and relevant goals. The effectiveness of goal-conditioned exploration with the latter is then demonstrated in three challenging environments.
List of keywords
Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Deep reinforcement learning
1274
SMARTformer: Semi-Autoregressive Transformer with Efficient Integrated Window Attention for Long Time Series Forecasting
Yiduo Li, Zhongwen Rao, Zhe Li, Shiyi Qi, Lujia Pan, Zenglin Xu
[+] More
[-] Less
Transformers have achieved remarkable performance in long time series forecasting (LTSF), thanks to their powerful capture of long-range dependencies. However, the prediction on long time sequences has been significantly affected by the ability of capturing reliable local dependencies in segments of sequences. To address this issue, we introduce the SMARTformer denoting SeMiAutoRegressive Transformer with Efficient Integrated Window Attention. In detail, the semi-autoregressive (SAR) decoder first predicts each segment of the sequence iteratively to comprehensively capture local context in a way as autoregressive (AR) decoding; based on the previous output, it then refines the whole sequence in a non-autoregressive (NAR) way. Therefore, SAR benefits from both the global horizon of NAR and local detail capturing of AR. Moreover, it can be used as a general plug-in to further enhance the predicting performance of various transformer models on time series. Furthermore, to achieve complementary clues in local and enlarged receptive fields, we propose the Integrated Window Attention to separately conduct both local self-attention in multi-scale windows and global attention across windows. Especially, with a linear complexity, this design also brings significant improvement in computational efficiency. Finally, extensive studies on five benchmark datasets show the effectiveness of SMARTformer against SOTA works, with an improvement of 10.2% and 18.4% in multivariate and univariate long-term forecasting, respectively.
List of keywords
Data Mining -> DM: Mining spatial and/or temporal data
Machine Learning -> ML: Regression
Machine Learning -> ML: Time series and data streams
1310
Fairness via Group Contribution Matching
Tianlin Li, Zhiming Li, Anran Li, Mengnan Du, Aishan Liu, Qing Guo, Guozhu Meng, Yang Liu
[+] More
[-] Less
Fairness issues in Deep Learning models have recently received increasing attention due to their significant societal impact. Although methods for mitigating unfairness are constantly proposed, little research has been conducted to understand how discrimination and bias develop during the standard training process. In this study, we propose analyzing the contribution of each subgroup (i.e., a group of data with the same sensitive attribute) in the training process to understand the cause of such bias development process. We propose a gradient-based metric to assess training subgroup contribution disparity, showing that unequal contributions from different subgroups are one source of such unfairness. One way to balance the contribution of each subgroup is through oversampling, which ensures that an equal number of samples are drawn from each subgroup during each training iteration. However, we have found that even with a balanced number of samples, the contribution of each group remains unequal, resulting in unfairness under the oversampling strategy. To address the above issues, we propose an easy but effective group contribution matching (GCM) method to match the contribution of each subgroup. Our experiments show that our GCM effectively improves fairness and outperforms other methods significantly.
List of keywords
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
1327
Diagram Visual Grounding: Learning to See with Gestalt-Perceptual Attention
Xin Hu, Lingling Zhang, Jun Liu, Xinyu Zhang, Wenjun Wu, Qianying Wang
[+] More
[-] Less
Diagram visual grounding aims to capture the correlation between language expression and local objects in the diagram, and plays an important role in the applications like textbook question answering and cross-modal retrieval. Most diagrams consist of several colors and simple geometries. This results in sparse low-level visual features, which further aggravates the gap between low-level visual and high-level semantic features of diagrams. The phenomenon brings challenges to the diagram visual grounding. To solve the above issues, we propose a gestalt-perceptual attention model to align the diagram objects and language expressions. For low-level visual features, inspired by the gestalt that simulates human visual system, we build a gestalt-perception graph network to make up the features learned by the traditional backbone network. For high-level semantic features, we design a multi-modal context attention mechanism to facilitate the interaction between diagrams and language expressions, so as to enhance the semantics of diagrams. Finally, guided by diagram features and linguistic embedding, the target query is gradually decoded to generate the coordinates of the referred object. By conducting comprehensive experiments on diagrams and natural images, we demonstrate that the proposed model achieves superior performance over the competitors. Our code will be released at https://github.com/AIProCode/GPA.
List of keywords
Computer Vision -> CV: Vision and language 
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
1340
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
Junyang Wang, Ming Yan, Yi Zhang, Jitao Sang
[+] More
[-] Less
With the development of Vision-Language Pre-training Models (VLPMs) represented by CLIP and ALIGN, significant breakthroughs have been achieved for association-based visual tasks such as image classification and image-text retrieval by the zero-shot capability of CLIP without fine-tuning. However, CLIP is hard to apply to generation-based tasks. This is due to the lack of decoder architecture and pre-training tasks for generation. Although previous works have created generation capacity for CLIP through additional language models, a modality gap between the CLIP representations of different modalities and the inability of CLIP to model the offset of this gap, which results in the failure of the concept to transfer across modes. To solve the problem, we try to map images/videos to the language modality and generate captions from the language modality. In this paper, we propose the K-nearest-neighbor Cross-modality Mapping (Knight), a zero-shot method from association to generation. With vision-free unsupervised training, Knight achieves state-of-the-art performance in zero-shot methods for image captioning and video captioning.
List of keywords
Machine Learning -> ML: Multi-modal learning
Computer Vision -> CV: Vision and language 
Natural Language Processing -> NLP: Language generation
1361
HDFormer: High-order Directed Transformer for 3D Human Pose Estimation
Hanyuan Chen, Jun-Yan He, Wangmeng Xiang, Wei Liu, Zhi-Qi Cheng, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie
[+] More
[-] Less
Human pose estimation is a complicated structured data sequence modeling task. Most existing methods only consider the pair-wise interaction of human body joints in model learning. Unfortunately, this causes 3D pose estimation to fail in difficult cases such as \textit{joints overlapping}, and pose \textit{fast-changing}, as pair-wise relations cannot exploit fine-grained human body priors in pose estimation. To this end, we revamped the 3D pose estimation framework with a \textit{\textbf{H}igh-order} \textit{\textbf{D}irected} \textit{Transformer} (HDFormer), which coherently exploits the high-order relevances to boost the performance of pose estimation. Specifically, HDFormer adopts both self-attention and high-order attention schemes to build up a multi-order attention module to perform the information flow interaction including the first-order “\textit{joint$\leftrightarrow$joint}", second-order “\textit{bone$\leftrightarrow$joint}" as well as high-order “\textit{hyperbone$\leftrightarrow$joint}" relationships (hyperbone is defined as a joint set), compensating the hard cases prediction in fast-changing and heavy occlusion scenarios. Moreover, modernized CNN techniques are applied to upgrade the transformer-based architecture to speed up the HDFormer, achieving a favorable trade-off between effectiveness and efficiency. We compare our model with other SOTA models on the datasets Human3.6M and MPI-INF-3DHP. The results demonstrate that the proposed HDFormer achieves superior performance with only \textbf{1/10} parameters and much lower computational cost compared to the current SOTAs. Moreover, HDFormer can be applied to various types of real-world applications, enabling real-time and accurate 3D pose estimation \footnote{The source code is in https://shorturl.at/aISY0}.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Video analysis and understanding   
1363
Artificial Agents Inspired by Human Motivation Psychology for Teamwork in Hazardous Environments
Anupama Arukgoda, Erandi Lakshika, Michael Barlow, Kasun Gunawardana
[+] More
[-] Less
Multi-agent literature explores personifying artificial agents with personality, emotions or cognitive biases to produce “typical”, believable agents. In this study, we demonstrate the potential of endowing artificial agents with a motivation, using human implicit motivation psychology theory that introduces 3 motive profiles – power, achievement and affiliation, to create diverse, risk-aware agents. We first devise a framework to model these motivated agents (or agents with any inherent behavior), that can activate different strategies depending on the circumstances. We conduct experiments on a fire-fighting task domain, evaluate how motivated teams perform, and draw conclusions on appropriate team compositions to be deployed in environments with different risk levels. Our framework generates predictable agents as their resulting behaviors align with the inherent characteristics of their motives. We find that motivational diversity within teams is beneficial in dynamic collaborative environments, especially as the task risk level increases. Furthermore, we observed that the best composition in terms of the performance metrics used to evaluate team compositions, does not remain the same as the collaboration level required to achieve goals changes. These results have implications for future designs of risk-aware autonomous teams and Human-AI teams, as they highlight the prospects of creating better artificial teammates and performance gains that could be achieved through anthropomorphized motivated agents.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Human-AI collaboration
1378
End-to-End Combinatorial Ensemble Learning
James Kotary, Vincenzo Di Vito Francesco, Ferdinando Fioretto
[+] More
[-] Less
Ensemble learning is an important class of algorithms aimed at creating accurate and robust machine learning models by combining predictions from individual models. A key challenge in designing these algorithms is to find effective ways to combine the individual predictions for any particular input sample. This paper addresses this challenge and proposes an integration of constrained optimization and learning to derive specialized consensus rules. The resulting strategy learns to select appropriate predictors to combine for a particular input sample. The paper shows how to derive the ensemble learning task into a differentiable selection program which is trained end-to-end within the ensemble learning model. Results over several benchmarks, demonstrate the ability of the proposed solution to substantially outperform common and advanced consensus rules in a variety of settings and learning tasks.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Machine Learning -> ML: Applications
1379
Folded Optimization in End-to-End Learning
James Kotary, My Dinh, Ferdinando Fioretto
[+] More
[-] Less
The integration of constrained optimization models in deep learning has led to promising advances in both the machine learning and optimization domains. When an optimization problem has some undefined parameters, it can be viewed as a function that maps those parameters to corresponding optimal solutions. Such mappings are useful in learning constrained representations, for tasks that require special structure in the predictions or feature embeddings of a machine learning model. A primary challenge in learning with these integrated models is backpropagation through optimization mapping, which typically lacks a closed form. A common approach is unrolling, which relies on automatic differentiation through the operations of an iterative solver. While flexible and general, unrolling can encounter accuracy and efficiency issues in practice. These issues can be avoided by differentiating the optimization mapping analytically, but current frameworks impose rigid requirements on the optimization problem’s form. This paper provides theoretical insights into the backpropagation of unrolled optimizers, which lead to equivalent but efficiently solvable analytical models. Theoretically, it proposes a unifying view of unrolling and analytical differentiation through constrained optimization mappings.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Machine Learning -> ML: Applications
Machine Learning -> ML: Optimization
1384
Relation-enhanced DETR for Component Detection in Graphic Design Reverse Engineering
Xixuan Hao, Danqing Huang, Jieru Lin, Chin-Yew Lin
[+] More
[-] Less
It is a common practice for designers to create digital prototypes from a mock-up/screenshot. Reverse engineering graphic design by detecting its components (e.g., text, icon, button) helps expedite this process. This paper first conducts a statistical analysis to emphasize the importance of relations in graphic layouts, which further motivates us to incorporate relation modeling into component detection. Built on the current state-of-the-art DETR (DEtection TRansformer), we introduce a learnable relation matrix to model class correlations. Specifically, the matrix will be added in the DETR decoder to update the query-to-query self-attention. Experiment results on three public datasets show that our approach achieves better performance than several strong baselines. We further visualize the learnt relation matrix and observe some reasonable patterns. Moreover, we show an application of component detection where we leverage the detection outputs as augmented training data for layout generation, which achieves promising results.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Arts and creativity
Computer Vision -> CV: Applications
Computer Vision -> CV: Recognition (object detection, categorization)
1390
Reducing Communication for Split Learning by Randomized Top-k Sparsification
Fei Zheng, Chaochao Chen, Lingjuan Lyu, binhui yao
[+] More
[-] Less
Split learning is a simple solution for Vertical Federated Learning (VFL), which has drawn substantial attention in both research and application due to its simplicity and efficiency. However, communication efficiency is still a crucial issue for split learning. In this paper, we investigate multiple communication reduction methods for split learning, including cut layer size reduction, top-$k$ sparsification, quantization, and L1 regularization. Through analysis of the cut layer size reduction and top-$k$ sparsification, we further propose randomized top-$k$ sparsification, to make the model generalize and converge better. This is done by selecting top-$k$ elements with a large probability while also having a small probability to select non-top-$k$ elements. Empirical results show that compared with other communication-reduction methods, our proposed randomized top-$k$ sparsification achieves a better model performance under the same compression level.
List of keywords
Machine Learning -> ML: Federated learning
Machine Learning -> ML: Learning sparse models
Multidisciplinary Topics and Applications -> MDA: Security and privacy
1412
Towards Generalizable Reinforcement Learning for Trade Execution
Chuheng Zhang, Yitong Duan, Xiaoyu Chen, Jianyu Chen, Jian Li, Li Zhao
[+] More
[-] Less
Optimized trade execution is to sell (or buy) a given amount of assets in a given time with the lowest possible trading cost. Recently, reinforcement learning (RL) has been applied to optimized trade execution to learn smarter policies from market data. However, we find that many existing RL methods exhibit considerable overfitting which prevents them from real deployment. In this paper, we provide an extensive study on the overfitting problem in optimized trade execution. First, we model the optimized trade execution as offline RL with dynamic context (ORDC), where the context represents market variables that cannot be influenced by the trading policy and are collected in an offline manner. Under this framework, we derive the generalization bound and find that the overfitting issue is caused by large context space and limited context samples in the offline setting. Accordingly, we propose to learn compact representations for context to address the overfitting problem, either by leveraging prior knowledge or in an end-to-end manner. To evaluate our algorithms, we also implement a carefully designed simulator based on historical limit order book (LOB) data to provide a high-fidelity benchmark for different algorithms. Our experiments on the high-fidelity simulator demonstrate that our algorithms can effectively alleviate overfitting and achieve better performance.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Finance
Machine Learning -> ML: Deep reinforcement learning
1424
Dynamic Group Link Prediction in Continuous-Time Interaction Network
Shijie Luo, He Li, Jianbin Huang
[+] More
[-] Less
Recently, group link prediction has received increasing attention due to its important role in analyzing relationships between individuals and groups. However, most existing group link prediction methods emphasize static settings or only make cursory exploitation of historical information, so they fail to obtain good performance in dynamic applications. To this end, we attempt to solve the group link prediction problem in continuous-time dynamic scenes with fine-grained temporal information. We propose a novel continuous-time group link prediction method CTGLP to capture the patterns of future link formation between individuals and groups. A new graph neural network CTGNN is presented to learn the latent representations of individuals by biasedly aggregating neighborhood information. Moreover, we design an importance-based group modeling function to model the embedding of a group based on its known members. CTGLP eventually learns a probability distribution and predict the link target. Experimental results on various datasets with and without unseen nodes show that CTGLP outperforms the state-of-the-art methods by 13.4% and 13.2% on average.
List of keywords
Data Mining -> DM: Mining graphs
Data Mining -> DM: Networks
Multidisciplinary Topics and Applications -> MDA: Social sciences
1426
Contrastive Learning and Reward Smoothing for Deep Portfolio Management
Yun-Hsuan Lien, Yuan-Kui Li, Yu-Shuen Wang
[+] More
[-] Less
In this study, we used reinforcement learning (RL) models to invest assets in order to earn returns. The models were trained to interact with a simulated environment based on historical market data and learn trading strategies. However, using deep neural networks based on the returns of each period can be challenging due to the unpredictability of financial markets. As a result, the policies learned from training data may not be effective when tested in real-world situations. To address this issue, we incorporated contrastive learning and reward smoothing into our training process. Contrastive learning allows the RL models to recognize patterns in asset states that may indicate future price movements. Reward smoothing, on the other hand, serves as a regularization technique to prevent the models from seeking immediate but uncertain profits. We tested our method against various traditional financial techniques and other deep RL methods, and found it to be effective in both the U.S. stock market and the cryptocurrency market. Our source code will be made available for public access.
List of keywords
Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Relational learning
Machine Learning -> ML: Representation learning
1441
ViT-P3DE∗: Vision Transformer Based Multi-Camera Instance Association with Pseudo 3D Position Embeddings
Minseok Seo, Hyuk-Jae Lee, Xuan Truong Nguyen
[+] More
[-] Less
Multi-camera instance association, which identifies identical objects among multiple objects in multi-view images, is challenging due to several harsh constraints. To tackle this problem, most studies have employed CNNs as feature extractors but often fail under such harsh constraints. Inspired by Vision Transformer (ViT), we first develop a pure ViT-based framework for robust feature extraction through self-attention and residual connection. We then propose two novel methods to achieve robust feature learning. First, we introduce learnable pseudo 3D position embeddings (P3DEs) that represent the 3D location of an object in the world coordinate system, which is independent of the harsh constraints. To generate P3DEs, we encode the camera ID and the object’s 2D position in the image using embedding tables. We then build a framework that trains P3DEs to represent an object’s 3D position in a weakly supervised manner. Second, we also utilize joint patch generation (JPG). During patch generation, JPG considers an object and its surroundings as a single input patch to reinforce the relationship information between two features. Ultimately, experimental results demonstrate that both ViT-P3DE and ViT-P3DE with JPG achieve state-of-the-art performance and significantly outperform existing works, especially when dealing with extremely harsh constraints.
List of keywords
Computer Vision -> CV: Applications
Computer Vision -> CV: Recognition (object detection, categorization)
1442
Black-Box Data Poisoning Attacks on Crowdsourcing
Pengpeng Chen, Yongqiang Yang, Dingqi Yang, Hailong Sun, Zhijun Chen, Peng Lin
[+] More
[-] Less
Understanding the vulnerability of label aggregation against data poisoning attacks is key to ensuring data quality in crowdsourced label collection. State-of-the-art attack mechanisms generally assume full knowledge of the aggregation models while failing to consider the flexibility of malicious workers in selecting which instances to label. Such a setup limits the applicability of the attack mechanisms and impedes further improvement of their success rate. This paper introduces a black-box data poisoning attack framework that finds the optimal strategies for instance selection and labeling to attack unknown label aggregation models in crowdsourcing. We formulate the attack problem on top of a generic formalization of label aggregation models and then introduce a substitution approach that attacks a substitute aggregation model in replacement of the unknown model. Through extensive validation on multiple real-world datasets, we demonstrate the effectiveness of both instance selection and model substitution in improving the success rate of attacks.
List of keywords
Humans and AI -> HAI: Human-AI collaboration
Humans and AI -> HAI: Human computation and crowdsourcing
Machine Learning -> ML: Robustness
1456
Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and Beyond
Zhu Liu, Jinyuan Liu, Guanyao Wu, Long Ma, Xin Fan, Risheng Liu
[+] More
[-] Less
Recently, multi-modality scene perception tasks, e.g., image fusion and scene understanding, have attracted widespread attention for intelligent vision systems. However, early efforts always consider boosting a single task unilaterally and neglecting others, seldom investigating their underlying connections for joint promotion. To overcome these limitations, we establish the hierarchical dual tasks-driven deep model to bridge these tasks. Concretely, we firstly construct an image fusion module to fuse complementary characteristics and cascade dual task-related modules, including a discriminator for visual effects and a semantic network for feature measurement. We provide a bi-level perspective to formulate image fusion and follow-up downstream tasks. To incorporate distinct task-related responses for image fusion, we consider image fusion as a primary goal and dual modules as learnable constraints. Furthermore, we develop an efficient first-order approximation to compute corresponding gradients and present dynamic weighted aggregation to balance the gradients for fusion learning. Extensive experiments demonstrate the superiority of our method, which not only produces visually pleasant fused results but also realizes significant promotion for detection and segmentation than the state-of-the-art approaches.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Scene analysis and understanding   
1461
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya, Michael Ryoo
[+] More
[-] Less
Modeling visual data as tokens (i.e., image patches) using attention mechanisms, feed-forward networks or convolutions has been highly effective in recent years. Such methods usually have a common pipeline: a tokenization method, followed by a set of layers/blocks for information mixing, both within and among tokens. When image patches are converted into tokens, they are often flattened, discarding the spatial structure within each patch. As a result, any processing that follows (eg: multi-head self-attention) may fail to recover and/or benefit from such information. In this paper, we argue that models can have significant gains when spatial structure is preserved during tokenization, and is explicitly used during the mixing stage. We propose two key contributions: (1) Structure-aware Tokenization and, (2) Structure-aware Mixing, both of which can be combined with existing models with minimal effort. We introduce a family of models (SWAT), showing improvements over the likes of DeiT, MLP-Mixer and Swin Transformer, across multiple benchmarks including ImageNet classification and ADE20K segmentation. Our code and models will be released online.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
1476
Multi-objective Optimization-based Selection for Quality-Diversity by Non-surrounded-dominated Sorting
Ren-Jian Wang, Ke Xue, Haopu Shang, Chao Qian, Haobo Fu, Qiang Fu
[+] More
[-] Less
Quality-Diversity (QD) algorithms, a subset of evolutionary algorithms, maintain an archive (i.e., a set of solutions) and simulate the natural evolution process through iterative selection and reproduction, with the goal of generating a set of high-quality and diverse solutions. Though having found many successful applications in reinforcement learning, QD algorithms often select the parent solutions uniformly at random, which lacks selection pressure and may limit the performance. Recent studies have treated each type of behavior of a solution as an objective, and selected the parent solutions based on Multi-objective Optimization (MO), which is a natural idea, but has not lead to satisfactory performance as expected. This paper gives the reason for the first time, and then proposes a new MO-based selection method by non-surrounded-dominated sorting (NSS), which considers all possible directions of the behaviors, and thus can generate diverse solutions over the whole behavior space. By combining NSS with the most widespread QD algorithm, MAP-Elites, we perform experiments on synthetic functions and several complex tasks (i.e., QDGym, robotic arm, and Mario environment generation), showing that NSS achieves better performance than not only other MO-based selection methods but also state-of-the-art selection methods in QD.
List of keywords
Machine Learning -> ML: Evolutionary learning
Machine Learning -> ML: Reinforcement learning
Search -> S: Evolutionary computation
1479
Stability and Generalization of $\ell_p$-Regularized Stochastic Learning for GCN
Shaogao Lv, Shiyu Liu, linsen Wei, Ming Li
[+] More
[-] Less
Graph convolutional networks (GCN) are viewed as one of most popular representations among the variants of graph neural networks over graph data, and have shown powerful performance on empirical experiments. Those $\ell_2$-based graph smoothing enforces global smoothness of GCN, while (soft) $\ell_1$-based sparse graph learning tends to promote signal sparsity to trade for discontinuity. The current paper aims at quantifying the trade off of GCN between smoothness and sparsity, with the help of a general $\ell_p$-regularized $(1<p\leq 2)$ stochastic learning proposed in the paper. While stability-based generalization analysis have been given in prior work for a second derivative objectiveness function, our $\ell_p$-regularized learning scheme does not satisfy such a smooth condition. To address this issue, we propose a novel SGD proximal algorithm for GCN with an inexact operator. For a single-layer GCN, we establish an explicit theoretical understanding of GCN with the $\ell_p$-regularized stochastic learning by analyzing the stability of our SGD proximal algorithm. Several empirical experiments are implemented to validate our theoretical findings.
List of keywords
Uncertainty in AI -> UAI: Graphical models
1493
On the Reuse Bias in Off-Policy Reinforcement Learning
Chengyang Ying, Zhongkai Hao, Xinning Zhou, Hang Su, Dong Yan, Jun Zhu
[+] More
[-] Less
Importance sampling (IS) is a popular technique in off-policy evaluation, which re-weights the return of trajectories in the replay buffer to boost sample efficiency. However, training with IS can be unstable and previous attempts to address this issue mainly focus on analyzing the variance of IS. In this paper, we reveal that the instability is also related to a new notion of Reuse Bias of IS — the bias in off-policy evaluation caused by the reuse of the replay buffer for evaluation and optimization. We theoretically show that the off-policy evaluation and optimization of the current policy with the data from the replay buffer result in an overestimation of the objective, which may cause an erroneous gradient update and degenerate the performance. We further provide a high-probability upper bound of the Reuse Bias and show that controlling one term of the upper bound can control the Reuse Bias by introducing the concept of stability for off-policy algorithms. Based on these analyses, we present a novel yet simple Bias-Regularized Importance Sampling (BIRIS) framework along with practical algorithms, which can alleviate the negative impact of the Reuse Bias, and show that our BIRIS can significantly reduce the Reuse Bias empirically. Moreover, extensive experimental results show that our BIRIS-based methods can significantly improve the sample efficiency on a series of continuous control tasks in MuJoCo.
List of keywords
Machine Learning -> ML: Reinforcement learning
Machine Learning -> ML: Deep reinforcement learning
Planning and Scheduling -> PS: Markov decisions processes
1505
Clustered-patch Element Connection for Few-shot Learning
Jinxiang Lai, Siqian Yang, Junhong Zhou, Wenlong Wu, Xiaochen Chen, Jun Liu, Bin-Bin Gao, Chengjie Wang
[+] More
[-] Less
Weak feature representation problem has influenced the performance of few-shot classification task for a long time. To alleviate this problem, recent researchers build connections between support and query instances through embedding patch features to generate discriminative representations. However, we observe that there exists semantic mismatches (foreground/ background) among these local patches, because the location and size of the target object are not fixed. What is worse, these mismatches result in unreliable similarity confidences, and complex dense connection exacerbates the problem. According to this, we propose a novel Clustered-patch Element Connection (CEC) layer to correct the mismatch problem. The CEC layer leverages Patch Cluster and Element Connection operations to collect and establish reliable connections with high similarity patch features, respectively. Moreover, we propose a CECNet, including CEC layer based attention module and the CEC based distance metric. The former is utilized to generate a more discriminative representation benefiting from the global clustered-patch features, and the latter s introduced to reliably measure the similarity between pair-features. Extensive experiments demonstrate that our CECNet outperforms the state-of-the-art methods on multiple classification benchmark datasets. Furthermore, our CEC approach can be extended into few-shot segmentation and detection tasks and achieves competitive improvements.
List of keywords
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
1510
Dual-view Correlation Hybrid Attention Network for Robust Holistic Mammogram Classification
Zhiwei Wang, Junlin Xian, Kangyi Liu, Xin Li, Qiang Li, Xin Yang
[+] More
[-] Less
Mammogram image is important for breast cancer screening, and typically obtained in a dual-view form, i.e., cranio-caudal (CC) and mediolateral oblique (MLO), to provide complementary information for clinical decisions. However, previous methods mostly learn features from the two views independently, which violates the clinical knowledge and ignores the importance of dual-view correlation in the feature learning. In this paper, we propose a dual-view correlation hybrid attention network (DCHA-Net) for robust holistic mammogram classification. Specifically, DCHA-Net is carefully designed to extract and reinvent deep feature maps for the two views, and meanwhile to maximize the underlying correlations between them. A hybrid attention module, consisting of local relation and non-local attention blocks, is proposed to alleviate the spatial misalignment of the paired views in the correlation maximization. A dual-view correlation loss is introduced to maximize the feature similarity between corresponding strip-like regions with equal distance to the chest wall, motivated by the fact that their features represent the same breast tissues, and thus should be highly-correlated with each other. Experimental results on the two public datasets, i.e., INbreast and CBIS-DDSM, demonstrate that the DCHA-Net can well preserve and maximize feature correlations across views, and thus outperforms previous state-of-the-art methods for classifying a whole mammogram as malignant or not.
List of keywords
Computer Vision -> CV: Biomedical image analysis
Machine Learning -> ML: Knowledge-aided learning
Machine Learning -> ML: Multi-view learning
1529
Image Composition with Depth Registration
Zan Li, Wencheng Wang, Fei Hou
[+] More
[-] Less
Handling occlusions is still a challenging problem for image composition. It always requires the source contents to be completely in front of the target contents or needs manual interventions to adjust occlusions, which is very tedious. Though several methods have suggested exploiting priors or learning techniques for promoting occlusion determination, their potentials are much limited. This paper addresses the challenge by presenting a depth registration method for merging the source contents seamlessly into the 3D space that the target image represents. Thus, the occlusions between the source contents and target contents can be conveniently handled through pixel-wise depth comparisons, allowing the user to more efficiently focus on the designs for image composition. Experimental results show that we can conveniently handle occlusions in image composition and improve efficiency by about 4 times compared to Photoshop.
List of keywords
Computer Vision -> CV: Scene analysis and understanding   
Multidisciplinary Topics and Applications -> MDA: Arts and creativity
1539
IID-GAN: an IID Sampling Perspective for Regularizing Mode Collapse
Yang Li, Liangliang Shi, Junchi Yan
[+] More
[-] Less
Despite its success, generative adversarial networks (GANs) still suffer from mode collapse, i.e., the generator can only map latent variables to a partial set of modes in the target distribution. In this paper, we analyze and seek to regularize this issue with an independent and identically distributed (IID) sampling perspective and emphasize that holding the IID property referring to the target distribution for generation can naturally avoid mode collapse. This is based on the basic IID assumption for real data in machine learning. However, though the source samples {z} obey IID, the generations {G(z)} may not necessarily be IID sampling from the target distribution. Based on this observation, considering a necessary condition of IID generation, we propose a new loss to encourage the closeness between the inverse samples of real data and the Gaussian source in the latent space to regularize the generation to be IID from the target distribution. The logic is that the inverse samples from target data should also be IID in the source distribution. Experiments on both synthetic and real-world data show the effectiveness of our model.
List of keywords
Machine Learning -> ML: Generative adverserial networks
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
1540
KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot Node Classification
Likang Wu, Junji Jiang, Hongke Zhao, Hao Wang, Defu Lian, Mengdi Zhang, Enhong Chen
[+] More
[-] Less
Recently, Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis. This task aims to predict nodes from unseen classes which are unobserved in the training process. Existing work mainly utilizes Graph Neural Networks (GNNs) to associate features’ prototypes and labels’ semantics thus enabling knowledge transfer from seen to unseen classes. However, the multi-faceted semantic orientation in the feature-semantic alignment has been neglected by previous work, i.e. the content of a node usually covers diverse topics that are relevant to the semantics of multiple labels. It’s necessary to separate and judge the semantic factors that tremendously affect the cognitive ability to improve the generality of models. To this end, we propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics via the extracted KG (Knowledge Graph)-based topics. And then the content of each node is reconstructed to a topic-level representation that offers multi-faceted and fine-grained semantic relevancy to different labels. Due to the particularity of the graph’s instance (i.e., node) representation, a novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation. Finally, we conduct extensive experiments on several public graph datasets and design an application of zero-shot cross-domain recommendation. The quantitative results demonstrate both the effectiveness and generalization of KMF with the comparison of state-of-the-art baselines.
List of keywords
Data Mining -> DM: Applications
Data Mining -> DM: Knowledge graphs and knowledge base completion
Data Mining -> DM: Mining graphs
1542
Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration For Video Captioning
Liqi Yan, Cheng Han, Zenglin Xu, Dongfang Liu, Qifan Wang
[+] More
[-] Less
Fine-tuning large vision-language models is a challenging task. Prompt tuning approaches have been introduced to learn fixed textual or visual prompts while freezing the pre-trained model in downstream tasks. Despite the effectiveness of prompt tuning, what do those learnable prompts learn remains unexplained. In this work, we explore whether prompts in the fine-tuning can learn knowledge-aware prompts from the pre-training, by designing two different sets of prompts in pre-training and fine-tuning phases respectively. Specifically, we present a Video-Language Prompt tuning (VL-Prompt) approach for video captioning, which first efficiently pre-train a video-language model to extract key information (e.g., actions and objects) with flexibly generated Knowledge-Aware Prompt (KAP). Then, we design a Video-Language Prompt (VLP) to transfer the knowledge from the knowledge-aware prompts and fine-tune the model to generate full captions. Experimental results show the superior performance of our approach over several state-of-the-art baselines. We further demonstrate that the video-language prompts are well learned from the knowledge-aware prompts.
List of keywords
Computer Vision -> CV: Video analysis and understanding   
Computer Vision -> CV: Vision and language 
1560
Abstraction of Nondeterministic Situation Calculus Action Theories
Bita Banihashemi, Giuseppe De Giacomo, Yves Lesperance
[+] More
[-] Less
We develop a general framework for abstracting the behavior of an agent that operates in a nondeterministic domain, i.e., where the agent does not control the outcome of the nondeterministic actions, based on the nondeterministic situation calculus and the ConGolog programming language. We assume that we have both an abstract and a concrete nondeterministic basic action theory, and a refinement mapping which specifies how abstract actions, decomposed into agent actions and environment reactions, are implemented by concrete ConGolog programs. This new setting supports strategic reasoning and strategy synthesis, by allowing us to quantify separately on agent actions and environment reactions. We show that if the agent has a (strong FOND) plan/strategy to achieve a goal/complete a task at the abstract level, and it can always execute the nondeterministic abstract actions to completion at the concrete level, then there exist a refinement of it that is a (strong FOND) plan/strategy to achieve the refinement of the goal/task at the concrete level.
List of keywords
Knowledge Representation and Reasoning -> KRR: Reasoning about actions
Agent-based and Multi-agent Systems -> MAS: Agent theories and models
1572
Overlooked Implications of the Reconstruction Loss for VAE Disentanglement
Nathan Michlo, Richard Klein, Steven James
[+] More
[-] Less
Learning disentangled representations with variational autoencoders (VAEs) is often attributed to the regularisation component of the loss. In this work, we highlight the interaction between data and the reconstruction term of the loss as the main contributor to disentanglement in VAEs. We show that standard benchmark datasets have unintended correlations between their subjective ground-truth factors and perceived axes in the data according to typical VAE reconstruction losses. Our work exploits this relationship to provide a theory for what constitutes an adversarial dataset under a given reconstruction loss. We verify this by constructing an example dataset that prevents disentanglement in state-of-the-art frameworks while maintaining human-intuitive ground-truth factors. Finally, we re-enable disentanglement by designing an example reconstruction loss that is once again able to perceive the ground-truth factors. Our findings demonstrate the subjective nature of disentanglement and the importance of considering the interaction between the ground-truth factors, data and notably, the reconstruction loss, which is under-recognised in the literature.
List of keywords
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Autoencoders
Machine Learning -> ML: Unsupervised learning
1580
On Conditional and Compositional Language Model Differentiable Prompting
Jonathan Pilault, Can Liu, Mohit Bansal, Markus Dreyer
[+] More
[-] Less
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks. Prompts can be represented by a human-engineered word sequence or by a learned continuous embedding. In this work, we investigate conditional and compositional differentiable prompting. We propose a new model, Prompt Production System (ProPS), which learns to transform task instructions or input metadata, into continuous prompts that elicit task-specific outputs from the PLM. Our model uses a modular network structure based on our neural formulation of Production Systems, which allows the model to learn discrete rules — neural functions that learn to specialize in transforming particular prompt input patterns, making it suitable for compositional transfer learning and few-shot learning. We present extensive empirical and theoretical analysis and show that ProPS consistently surpasses other PLM adaptation techniques, and often improves upon fully fine-tuned models, on compositional generalization tasks, controllable summarization and multilingual translation, while needing fewer trainable parameters.
List of keywords
Machine Learning -> ML: Multi-task and transfer learning
Machine Learning -> ML: Neuro-symbolic methods
1585
Domain-Adaptive Self-Supervised Face & Body Detection in Drawings
Barış Batuhan Topal, Deniz Yuret, Tevfik Metin Sezgin
[+] More
[-] Less
Drawings are powerful means of pictorial abstraction and communication. Understanding diverse forms of drawings, including digital arts, cartoons, and comics, has been a major problem of interest for the computer vision and computer graphics communities. Although there are large amounts of digitized drawings from comic books and cartoons, they contain vast stylistic variations, which necessitate expensive manual labeling for training domain-specific recognizers. In this work, we show how self-supervised learning, based on a teacher-student network with a modified student network update design, can be used to build face and body detectors. Our setup allows exploiting large amounts of unlabeled data from the target domain when labels are provided for only a small subset of it. We further demonstrate that style transfer can be incorporated into our learning pipeline to bootstrap detectors using a vast amount of out-of-domain labeled images from natural images (i.e., images from the real world). Our combined architecture yields detectors with state-of-the-art (SOTA) and near-SOTA performance using minimal annotation effort. Our code can be accessed from https://github.com/barisbatuhan/DASS_Detector.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
Machine Learning -> ML: Self-supervised Learning
1588
Accurate MRI Reconstruction via Multi-Domain Recurrent Networks
Jinbao Wei, Zhijie Wang, Kongqiao Wang, Li Guo, Xueyang Fu, Ji Liu, Xun Chen
[+] More
[-] Less
In recent years, deep convolutional neural networks (CNNs) have become dominant in MRI reconstruction from undersampled k-space. However, most existing CNNs methods reconstruct the undersampled images either in the spatial domain or in the frequency domain, and neglecting the correlation between these two domains. This hinders the further reconstruction performance improvement. To tackle this issue, in this work, we propose a new multi-domain recurrent network (MDR-Net) with multi-domain learning (MDL) blocks as its basic units to reconstruct the undersampled MR image progressively. Specifically, the MDL block interactively processes the local spatial features and the global frequency information to facilitate complementary learning, leading to fine-grained features generation. Furthermore, we introduce an effective frequency-based loss to narrow the frequency spectrum gap, compensating for over-smoothness caused by the widely used spatial reconstruction loss. Extensive experiments on public fastMRI datasets demonstrate that our MDR-Net consistently outperforms other competitive methods and is able to provide more details.
List of keywords
Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Applications
1590
A Fast Algorithm for Consistency Checking Partially Ordered Time
Leif Eriksson, Victor Lagerkvist
[+] More
[-] Less
Partially ordered models of time occur naturally in applications where agents/processes cannot perfectly communicate with each other, and can be traced back to the seminal work of Lamport. In this paper we consider the problem of deciding if a (likely incomplete) description of a system of events is consistent, the network consistency problem for the point algebra of partially ordered time (POT). While the classical complexity of this problem has been fully settled, comparably little is known of the fine-grained complexity of POT except that it can be solved in O*((0.368n)n) time by enumerating ordered partitions. We construct a much faster algorithm with a run-time bounded by O*((0.26n)n), which, e.g., is roughly 1000 times faster than the naive enumeration algorithm in a problem with 20 events. This is achieved by a sophisticated enumeration of structures similar to total orders, which are then greedily expanded toward a solution. While similar ideas have been explored earlier for related problems it turns out that the analysis for POT is non-trivial and requires significant new ideas.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
1593
Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement
Hang Guo, Tao Dai, Guanghao Meng, Shu-Tao Xia
[+] More
[-] Less
Scene text image super-resolution (STISR), aiming to improve image quality while boosting scene text recognition accuracy, has recently achieved great success. However, most existing methods treat the foreground (character regions) and background (non-character regions) equally in the forward process, while neglecting the disturbance from the complex background, thus limiting the performance. To address this issue, in this paper, we propose a novel method LEMMA that explicitly models character regions to produce high-level text-specific guidance for super-resolution. To model the location of characters effectively, we propose the location enhancement module to extract character region features based on attention map sequence. Besides, we propose the multi-modal alignment module to perform bidirectional visual-semantic alignment to generate high-quality prior guidance, which is then incorporated into super-resolution branch to reconstruct high-quality recognizable scene text images. Experiments on TextZoom and four scene text recognition benchmarks demonstrate the superiority of our method over other state-of-the-art methods.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Applications
Computer Vision -> CV: Machine learning for vision
1596
RaSa: Relation and Sensitivity Aware Representation Learning for Text-based Person Search
Yang Bai, cao min, Daming Gao, Ziqiang Cao, Chen Chen, Zhenfeng Fan, Liqiang Nie, Min Zhang
[+] More
[-] Less
Text-based person search aims to retrieve the specified person images given a textual description. The key to tackling such a challenging task is to learn powerful multi-modal representations. Towards this, we propose a Relation and Sensitivity aware representation learning method (RaSa), including two novel tasks: Relation-Aware learning (RA) and Sensitivity-Aware learning (SA). For one thing, existing methods cluster representations of all positive pairs without distinction and overlook the noise problem caused by the weak positive pairs where the text and the paired image have noise correspondences, thus leading to overfitting learning. RA offsets the overfitting risk by introducing a novel positive relation detection task (i.e., learning to distinguish strong and weak positive pairs). For another thing, learning invariant representation under data augmentation (i.e., being insensitive to some transformations) is a general practice for improving representation’s robustness in existing methods. Beyond that, we encourage the representation to perceive the sensitive transformation by SA (i.e., learning to detect the replaced words), thus promoting the representation’s robustness. Experiments demonstrate that RaSa outperforms existing state-of-the-art methods by 6.94%, 4.45% and 15.35% in terms of Rank@1 on CUHK-PEDES, ICFG-PEDES and RSTPReid datasets, respectively. Our code will be released.
List of keywords
Computer Vision -> CV: Vision and language 
1598
Improved Algorithms for Allen’s Interval Algebra by Dynamic Programming with Sublinear Partitioning
Leif Eriksson, Victor Lagerkvist
[+] More
[-] Less
Allen’s interval algebra is one of the most well-known calculi in qualitative temporal reasoning with numerous applications in artificial intelligence. Very recently, there has been a surge of improvements in the fine-grained complexity of NP-hard reasoning tasks in this algebra, which has improved the running time from the naive 2^O(n^2) to O*((1.0615n)^n), and even faster algorithms are known for unit intervals and the case when we a bounded number of overlapping intervals. Despite these improvements the best known lower bound is still only 2^o(n) under the exponential-time hypothesis and major improvements in either direction seemingly require fundamental advances in computational complexity. In this paper we propose a novel framework for solving NP-hard qualitative reasoning problems which we refer to as dynamic programming with sublinear partitioning. Using this technique we obtain a major improvement of O*((cn/log(n))^n) for Allen’s interval algebra. To demonstrate that the technique is applicable to further problem domains we apply it to a problem in qualitative spatial reasoning, the cardinal direction calculus, and solve it in O*((cn/log(n))^(2n/3)) time. Hence, not only do we significantly advance the state-of-the-art for NP-hard qualitative reasoning problems, but obtain a novel algorithmic technique that is likely applicable to many problems where 2^O(n) time algorithms are unlikely.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
1603
Error in the Euclidean Preference Model
Luke Thorburn, Maria Polukarov, Carmine Ventre
[+] More
[-] Less
Spatial models of preference, in the form of vector embeddings, are learned by many deep learning and multiagent systems, including recommender systems. Often these models are assumed to approximate a Euclidean structure, where an individual prefers alternatives positioned closer to their "ideal point", as measured by the Euclidean metric. However, Bogomolnaia & Laslier (2007) showed that there exist ordinal preference profiles that cannot be represented with this structure if the Euclidean space has two fewer dimensions than there are individuals or alternatives. We extend this result, showing that there are realistic situations in which almost all preference profiles cannot be represented with the Euclidean model, and derive a theoretical lower bound on the expected error when using the Euclidean model to approximate non-Euclidean preference profiles. Our results have implications for the interpretation and use of vector embeddings, because in some cases close approximation of arbitrary, true ordinal relationships can be expected only if the dimensionality of the embeddings is a substantial fraction of the number of entities represented.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Computational social choice
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Machine Learning -> ML: Learning preferences or rankings
1604
A Fast Adaptive Randomized PCA Algorithm
Xu Feng, Wenjian Yu
[+] More
[-] Less
It is desirable to adaptively determine the number of dimensions (rank) for PCA according to a given tolerance of low-rank approximation error. In this work, we aim to develop a fast algorithm solving this adaptive PCA problem. We propose to replace the QR factorization in randQB_EI algorithm with matrix multiplication and inversion of small matrices, and propose a new error indicator to incrementally evaluate approximation error in Frobenius norm. Combining the shifted power iteration technique for better accuracy, we finally build up an algorithm named farPCA. Experimental results show that farPCA is much faster than the baseline methods (randQB_EI, randUBV and svds) in practical setting of multi-thread computing, while producing nearly optimal results of adpative PCA.
List of keywords
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
Data Mining -> DM: Big data and scalability
Data Mining -> DM: Theoretical foundations of data mining
1621
Specifying and Testing k-Safety Properties for Machine-Learning Models
Maria Christakis, Hasan Ferit Eniser, Jörg Hoffmann, Adish Singla, Valentin Wüstholz
[+] More
[-] Less
Machine-learning models are becoming increasingly prevalent in our lives, for instance assisting in image-classification or decision-making tasks. Consequently, the reliability of these models is of critical importance and has resulted in the development of numerous approaches for validating and verifying their robustness and fairness. However, beyond such specific properties, it is challenging to specify, let alone check, general functional-correctness expectations from models. In this paper, we take inspiration from specifications used in formal methods, expressing functional-correctness properties by reasoning about k different executions—so-called k-safety properties. Considering a credit-screening model of a bank, the expected property that "if a person is denied a loan and their income decreases, they should still be denied the loan" is a 2-safety property. Here, we show the wide applicability of k-safety properties for machine-learning models and present the first specification language for expressing them. We also operationalize the language in a framework for automatically validating such properties using metamorphic testing. Our experiments show that our framework is effective in identifying property violations, and that detected bugs could be used to train better models.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Software engineering
Agent-based and Multi-agent Systems -> MAS: Engineering methods, platforms, languages and tools
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
1624
Hierarchical Transformer for Scalable Graph Learning
Wenhao Zhu, Tianyu Wen, Guojie Song, Xiaojun Ma, Liang Wang
[+] More
[-] Less
Graph Transformer is gaining increasing attention in the field of machine learning and has demonstrated state-of-the-art performance on benchmarks for graph representation learning. However, as current implementations of Graph Transformer primarily focus on learning representations of small-scale graphs, the quadratic complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs. Additionally, conventional sampling-based methods fail to capture necessary high-level contextual information, resulting in a significant loss of performance. In this paper, we introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges. HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance. By utilizing graph hierarchies constructed through coarsening techniques, HSGT efficiently updates and stores multi-scale information in node embeddings at different levels. Together with sampling-based training methods, HSGT effectively captures and aggregates multi-level information on the hierarchical graph using only Transformer blocks. Empirical evaluations demonstrate that HSGT achieves state-of-the-art performance on large-scale benchmarks with graphs containing millions of nodes with high efficiency.
List of keywords
Machine Learning -> ML: Sequence and graph learning
1626
PowerBEV: A Powerful yet Lightweight Framework for Instance Prediction in Bird’s-Eye View
Peizheng Li, Shuxiao Ding, Xieyuanli Chen, Niklas Hanselmann, Marius Cordts, Jürgen Gall
[+] More
[-] Less
Accurately perceiving instances and predicting their future motion are key tasks for autonomous vehicles, enabling them to navigate safely in complex urban traffic. While bird’s-eye view (BEV) representations are commonplace in perception for autonomous driving, their potential in a motion prediction setting is less explored. Existing approaches for BEV instance prediction from surround cameras rely on a multi-task auto-regressive setup coupled with complex post-processing to predict future instances in a spatio-temporally consistent manner. In this paper, we depart from this paradigm and propose an efficient novel end-to-end framework named PowerBEV, which differs in several design choices aimed at reducing the inherent redundancy in previous methods. First, rather than predicting the future in an auto-regressive fashion, PowerBEV uses a parallel, multi-scale module built from lightweight 2D convolutional networks. Second, we show that segmentation and centripetal backward flow are sufficient for prediction, simplifying previous multi-task objectives by eliminating redundant output modalities. Building on this output representation, we propose a simple, flow warping-based post-processing approach which produces more stable instance associations across time. Through this lightweight yet powerful design, PowerBEV outperforms state-of-the-art baselines on the NuScenes Dataset and poses an alternative paradigm for BEV instance prediction. Code will be released upon publication.
List of keywords
Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Segmentation
1630
DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving
Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie
[+] More
[-] Less
Streaming perception is a vital aspect of autonomous driving, yet previous research has lacked systematic examination. To address this, we propose the optimized framework, DAMO-StreamNet, which incorporates recent advances from the YOLO series and conducts a comprehensive analysis of spatial and temporal perception mechanisms to provide a state-of-the-art solution. The key innovations of DAMO-StreamNet include 1) utilization of a robust neck structure incorporating deformable convolution, which improves the receptive field and feature alignment abilities, 2) introduction of a dual-branch structure for extracting longer time-series information, resulting in improved prediction accuracy for motion states, 3) distillation at the logits level, which aligns the logits of the teacher and student models to the semantic space for more efficient optimization, and 4) real-time forecasting mechanism updates support frame features with the current frame before the next prediction in the inference phase to handle real-time streaming perception. Our experiments have shown that DAMO-StreamNet outperforms existing SOTA methods, achieving 37.8\% (normal size (600, 960)) and 43.3\% (large size (1200, 1920)) sAP without using any extra data. This work not only establishes a new benchmark for streaming perception but also provides valuable insights for future research. Moreover, DAMO-StreamNet can be applied to various types of autonomous systems, such as drones and robots, enabling real-time and accurate perception of the environment.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Recognition (object detection, categorization)
1633
Differentiable Economics for Randomized Affine Maximizer Auctions
Michael Curry, Tuomas Sandholm, John Dickerson
[+] More
[-] Less
A recent approach to automated mechanism design, differentiable economics, represents auctions by rich function approximators and optimizes their performance by gradient descent. The ideal auction architecture for differentiable economics would be perfectly strategyproof, support multiple bidders and items, and be rich enough to represent the optimal (i.e. revenue-maximizing) mechanism. So far, such an architecture does not exist. There are single-bidder approaches (MenuNet, RochetNet) which are always strategyproof and can represent optimal mechanisms. RegretNet is multi-bidder and can approximate any mechanism, but is only approximately strategyproof. We present an architecture that supports multiple bidders and is perfectly strategyproof, but cannot necessarily represent the optimal mechanism. This architecture is the classic affine maximizer auction (AMA), modified to offer lotteries. By using the gradient-based optimization tools of differentiable economics, we can now train lottery AMAs, competing with or outperforming prior approaches in revenue.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Mechanism design
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Multidisciplinary Topics and Applications -> MDA: Economics
1635
A Bitwise GAC Algorithm For Alldifferent Constraints
Zhe Li, Yaohua Wang, Zhanshan Li
[+] More
[-] Less
The generalized arc consistency (GAC) algorithm is the prevailing solution for alldifferent constraint problems. The core part of GAC for alldifferent constraints is excavating and enumerating all the strongly connected components (SCCs) of the graph model. This causes a large amount of complex data structures to maintain the node information, leading to a large overhead both in time and memory space. More critically, the complexity of the data structures further precludes the coordination of different optimization schemes for GAC. To solve this problem, the key observation of this paper is that the GAC algorithm only cares whether a node of the graph model is in an SCC or not, rather than which SCCs it belongs to. Based on this observation, we propose AllDiffbit, which employs bitwise data structures and operations to efficiently determine if a node is in an SCC. This greatly reduces the corresponding overhead, and enhances the ability to incorporate existing optimizations to work in a synergistic way. Our experiments show that AllDiffbit outperforms the state-of-the-art GAC algorithms over 60%.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Constraint programming
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
1639
Bipolar Abstract Dialectical Frameworks Are Covered by Kleene’s Three-Valued Logic
Ringo Baumann, Maximilian Heinrich
[+] More
[-] Less
Abstract dialectical frameworks (ADFs) are one of the most powerful generalizations of classical Dung-style argumentation frameworks (AFs). The additional expressive power comes with an increase in computational complexity, namely one level up in the polynomial hierarchy in comparison to their AF counterparts. However, there is one important subclass, so-called bipolar ADFs (BADFs) which are as complex as classical AFs while offering strictly more modeling capacities. This property makes BADFs very attractive from a knowledge representation point of view and is the main reason why this class has received much attention recently. The semantics of ADFs rely on the Gamma-operator which takes as an input a three-valued interpretation and returns a new one. However, in order to obtain the output the original definition requires to consider any two-valued completion of a given three-valued interpretation. In this paper we formally prove that in case of BADFs we may bypass the computationally intensive procedure via applying Kleene’s three-valued logic K. We therefore introduce the so-called bipolar disjunctive normal form which is simply a disjunctive normal form where any used atom possesses either a positive or a negative polarity. We then show that: First, this normal form is expressive enough to represent any BADF and secondly, the computation can be done via Kleene’s K instead of dealing with two-valued completions. Inspired by the main correspondence result we present some first experiments showing the computational benefit of using Kleene.
List of keywords
Knowledge Representation and Reasoning -> KRR: Argumentation
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning
1654
SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation
Shuyi Ouyang, Hongyi Wang, Shiao Xie, Ziwei Niu, Ruofeng Tong, Yen-Wei Chen, Lanfen Lin
[+] More
[-] Less
Referring image segmentation aims to segment an object out of an image via a specific language expression. The main concept is establishing global visual-linguistic relationships to locate the object and identify boundaries using details of the image. Recently, various Transformer-based techniques have been proposed to efficiently leverage long-range cross-modal dependencies, enhancing performance for referring segmentation. However, existing methods consider visual feature extraction and cross-modal fusion separately, resulting in insufficient visual-linguistic alignment in semantic space. In addition, they employ sequential structures and hence lack multi-scale information interaction. To address these limitations, we propose a Scale-Wise Language-Guided Vision Transformer (SLViT) with two appealing designs: (1) Language-Guided Multi-Scale Fusion Attention, a novel attention mechanism module for extracting rich local visual information and modeling global visual-linguistic relationships in an integrated manner. (2) An Uncertain Region Cross-Scale Enhancement module that can identify regions of high uncertainty using linguistic features and refine them via aggregated multi-scale features. We have evaluated our method on three benchmark datasets. The experimental results demonstrate that SLViT surpasses state-of-the-art methods with lower computational cost. The code is publicly available at: https://github.com/NaturalKnight/SLViT.
List of keywords
Computer Vision -> CV: Vision and language 
Computer Vision -> CV: Segmentation
1666
Autonomous Exploration for Navigating in MDPs using Blackbox RL Algorithms
Pratik Gajane, Peter Auer, Ronald Ortner
[+] More
[-] Less
We consider the problem of navigating in a Markov decision process where extrinsic rewards are either absent or ignored. In this setting, the objective is to learn policies to reach all the states that are reachable within a given number of steps (in expectation) from a starting state. We introduce a novel meta-algorithm which can use any online reinforcement learning algorithm (with appropriate regret guarantees) as a black-box. Our algorithm demonstrates a method for transforming the output of online algorithms to a batch setting. We prove an upper bound on the sample complexity of our algorithm in terms of the regret bound of the used black box RL algorithm. Furthermore, we provide experimental results to validate the effectiveness of our algorithm and correctness of our theoretical results.
List of keywords
Machine Learning -> ML: Reinforcement learning
1679
Temporal Datalog with Existential Quantification
Matthias Lanzinger, Markus Nissl, Emanuel Sallinger, Przemysław Wałęga
[+] More
[-] Less
Existential rules, also known as tuple-generating dependencies (TGDs) or Datalog+/- rules, are heavily studied in the communities of Knowledge Representation and Reasoning, Semantic Web, and Databases, due to their rich modelling capabilities. In this paper we consider TGDs in the temporal setting, by introducing and studying DatalogMTLE—an extension of metric temporal Datalog (DatalogMTL) obtained by allowing for existential rules in programs. We show that DatalogMTLE is undecidable even in the restricted cases of guarded and weakly-acyclic programs. To address this issue we introduce uniform semantics which, on the one hand, is well-suited for modelling temporal knowledge as it prevents from unintended value invention and, on the other hand, provides decidability of reasoning; in particular, it becomes 2-EXPSPACE-complete for weakly-acyclic programs but remains undecidable for guarded programs. We provide an implementation for the decidable case and demonstrate its practical feasibility. Thus we obtain an expressive, yet decidable, rule-language and a system which is suitable for complex temporal reasoning with existential rules.
List of keywords
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
1685
Learning Heuristically-selected and Neurally-guided Feature for Age Group Recognition using Unconstrained Smartphone Interaction
Yingmao Miao, Qiwei Tian, Chenhao Lin, tianle song, Yajie Zhou, Junyi Zhao, Shuxin Gao, Chao Shen, minghui yang
[+] More
[-] Less
Owing to the boom of smartphone industries, the expansion of phone users has also been significant. Besides adults, children and elders have also begun to join the population of daily smartphone users. Such an expansion indeed facilitates the further exploration of the versatility and flexibility of digitization. However, these new users may also be susceptible to issues such as addiction, fraud, and insufficient accessibility. To fully utilize the capability of mobile devices without breaching personal privacy, we build the first corpus for age group recognition on smartphones with more than 1,445,087 unrestricted actions from 2,100 subjects. Then a series of heuristically-selected and neurally-guided features are proposed to increase the separability of the above dataset. Finally, we develop AgeCare, the first implicit and continuous system incorporated with bottom-to-top functionality without any restriction on user-phone interaction scenarios, for accurate age group recognition and age-tailored assistance on smartphones. Our system performs impressively well on this dataset and significantly surpasses the state-of-the-art methods.
List of keywords
Humans and AI -> HAI: Human-computer interaction
Humans and AI -> HAI: Personalization and user modeling
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
1698
Unbiased Risk Estimator to Multi-Labeled Complementary Label Learning
Yi Gao, Miao Xu, Min-Ling Zhang
[+] More
[-] Less
Multi-label learning (MLL) usually requires assigning multiple relevant labels to each instance. While a fully supervised MLL dataset needs a large amount of labeling effort, using complementary labels can help alleviate this burden. However, current approaches to learning from complementary labels are mainly designed for multi-class learning and assume that each instance has a single relevant label. This means that these approaches cannot be easily applied to MLL when only complementary labels are provided, where the number of relevant labels is unknown and can vary across instances. In this paper, we first propose the unbiased risk estimator for the multi-labeled complementary label learning (MLCLL) problem. We also provide an estimation error bound to ensure the convergence of the empirical risk estimator. In some cases, the unbiased estimator may give unbounded gradients for certain loss functions and result in overfitting. To mitigate this problem, we improve the risk estimator by minimizing a proper loss function, which has been shown to improve gradient updates. Our experimental results demonstrate the effectiveness of the proposed approach on various datasets.
List of keywords
Machine Learning -> ML: Classification
Machine Learning -> ML: Multi-label
Machine Learning -> ML: Weakly supervised learning
1716
Exploring Leximin Principle for Fair Core-Selecting Combinatorial Auctions: Payment Rule Design and Implementation
Hao Cheng, Shufeng Kong, Yanchen Deng, Caihua Liu, Xiaohu Wu, Bo An, Chongjun Wang
[+] More
[-] Less
Core-selecting combinatorial auctions (CAs) restrict the auction result in the core such that no coalitions could improve their utilities by engaging in collusion. The minimum-revenue-core (MRC) rule is a widely used core-selecting payment rule to maximize the total utilities of all bidders. However, the MRC rule can suffer from severe unfairness since it ignores individuals’ utilities. To address this limitation, we propose to explore the leximin principle to achieve fairness in core-selecting CAs since the leximin principle prefers to maximize the utility of the worst-off; the resulting bidder-leximin-optimal (BLO) payment rule is then theoretically analyzed and an effective algorithm is further provided to compute the BLO outcome. Moreover, we conduct extensive experiments to show that our algorithm returns fairer utility distributions and is faster than existing algorithms of core-selecting payment rules.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Mechanism design
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Computational social choice
1728
Adaptive Reward Shifting Based on Behavior Proximity for Offline Reinforcement Learning
Zhe Zhang, Xiaoyang Tan
[+] More
[-] Less
One of the major challenges of the current offline reinforcement learning research is to deal with the distribution shift problem due to the change in state-action visitations for the new policy. To address this issue, we present a novel reward shifting-based method. Specifically, to regularize the behavior of the new policy at each state, we modify the reward to be received by the new policy by shifting it adaptively according to its proximity to the behavior policy, and apply the reward shifting along opposite directions for in-distribution actions and the ones not. In this way we are able to guide the learning procedure of the new policy itself by influencing the consequence of its actions explicitly, helping it to achieve a better balance between behavior constraints and policy improvement. Empirical results on the popular D4RL benchmarks show that the proposed method obtains competitive performance compared to the state-of-art baselines.
List of keywords
Machine Learning -> ML: Reinforcement learning
1736
A Diffusion Model with Contrastive Learning for ICU False Arrhythmia Alarm Reduction
Feng Wu, Guoshuai Zhao, Xueming Qian, Li-wei Lehman
[+] More
[-] Less
The false arrhythmia alarms in intensive care units significantly disturb patients and medical staffs and cause noise disturbances and slow staff response time, lead to lower medical service quality. In order to alleviate false alarming in ICU, previous works proposed rule-based methods and traditional machine learning methods. However, these methods are time-consuming and labor-intensive and difficult to deal with high-dimensional, sparse, unbalance and limited data. To address the above issues, we propose a reconstruction model based on the conditional denoising diffusion model. The model generates real arrhythmia signals with characteristics of candidate samples and uses the distance between the generated samples and the original samples to judge the alarm type. We design a network with residual links and self-attention mechanism to capture long-term dependencies existing in signal sequences, and leverage the contractive learning mechanism to maximize mutual information between true arrhythmia alarms and false arrhythmia alarms. We demonstrate the effectiveness of our approach on the mimic arrhythmia dataset for determining the alarm in ventricular tachycardia and ventricular fibrillation situations. The code will be released on the Github after the paper is accepted.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Health and medicine
Machine Learning -> ML: Applications
Machine Learning -> ML: Time series and data streams
1738
Generative Flow Networks for Precise Reward-Oriented Active Learning on Graphs
Yinchuan Li, Zhigang Li, Wenqian Li, Yunfeng Shao, Yan Zheng, Jianye Hao
[+] More
[-] Less
Many score-based active learning methods have been successfully applied to graph-structured data, aiming to reduce the number of labels and achieve better performance of graph neural networks based on predefined score functions. However, these algorithms struggle to learn policy distributions that are proportional to rewards and have limited exploration capabilities. In this paper, we innovatively formulate the graph active learning problem as a generative process, named GFlowGNN, which generates various samples through sequential actions with probabilities precisely proportional to a predefined reward function. Furthermore, we propose the concept of flow nodes and flow features to efficiently model graphs as flows based on generative flow networks, where the policy network is trained with specially designed rewards. Extensive experiments on real datasets show that the proposed approach has good exploration capability and transferability, outperforming various state-of-the-art methods.
List of keywords
Machine Learning -> ML: Sequence and graph learning
1748
Deep Partial Multi-Label Learning with Graph Disambiguation
Haobo Wang, Shisong Yang, Gengyu Lyu, Weiwei Liu, Tianlei Hu, Ke Chen, Songhe Feng, Gang Chen
[+] More
[-] Less
In partial multi-label learning (PML), each data example is equipped with a candidate label set, which consists of multiple ground-truth labels and other false-positive labels. Recently, graph-based methods, which demonstrate a good ability to estimate accurate confidence scores from candidate labels, have been prevalent to deal with PML problems. However, we observe that existing graph-based PML methods typically adopt linear multi-label classifiers and thus fail to achieve superior performance. In this work, we attempt to remove several obstacles for extending them to deep models and propose a novel deep Partial multi-Label model with grAph-disambIguatioN (PLAIN). Specifically, we introduce the instance-level and label-level similarities to recover label confidences as well as exploit label dependencies. At each training epoch, labels are propagated on the instance and label graphs to produce relatively accurate pseudo-labels; then, we train the deep model to fit the numerical labels. Moreover, we provide a careful analysis of the risk functions to guarantee the robustness of the proposed model. Extensive experiments on various synthetic datasets and three real-world PML datasets demonstrate that PLAIN achieves significantly superior results to state-of-the-art methods.
List of keywords
Machine Learning -> ML: Multi-label
1749
Learning Calibrated Uncertainties for Domain Shift: A Distributionally Robust Learning Approach
Haoxuan Wang, Zhiding Yu, Yisong Yue, Animashree Anandkumar, Anqi Liu, Junchi Yan
[+] More
[-] Less
We propose a framework for learning calibrated uncertainties under domain shifts, considering the case where the source (training) distribution differs from the target (test) distribution. We detect such domain shifts through the use of a differentiable density ratio estimator and train it together with the task network, composing an adjusted softmax predictive form that concerns the domain shift. In particular, the density ratio estimator yields a density ratio that reflects the closeness of a target (test) sample to the source (training) distribution. We employ it to adjust the uncertainty of prediction in the task network. This idea of using the density ratio is based on the distributionally robust learning (DRL) framework, which accounts for the domain shift through adversarial risk minimization. We demonstrate that our proposed method generates calibrated uncertainties that benefit many downstream tasks, such as unsupervised domain adaptation (UDA) and semi-supervised learning (SSL). On these tasks, methods like self-training and FixMatch use uncertainties to select confident pseudo-labels for re-training. Our experiments show that the introduction of DRL leads to significant improvements in cross-domain performance. We also demonstrate that the estimated density ratios show an agreement with the human selection frequencies, suggesting a positive correlation with a proxy of human perceived uncertainties.
List of keywords
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Classification
Machine Learning -> ML: Multi-task and transfer learning
1756
Delegated Online Search
Pirmin Braun, Niklas Hahn, Martin Hoefer, Conrad Schecker
[+] More
[-] Less
In a delegation problem, a principal P with commitment power tries to pick one out of n options. Each option is drawn independently from a known distribution. Instead of inspecting the options herself, P delegates the information acquisition to a rational and self-interested agent A. After inspection, A proposes one of the options, and P can accept or reject. In this paper, we study a natural online variant of delegation, in which the agent searches through the options in an online fashion. How can we design algorithms for P that approximate the utility of her best option in hindsight? We show that P can obtain a \Theta(1/n)-approximation and provide more fine-grained bounds independent of n based on two parameters. If the ratio of maximum and minimum utility for A is bounded by a factor \alpha, we obtain an \Omega(\log\log \alpha / \log \alpha)-approximation algorithm and show that this is best possible. If P cannot distinguish options with the same value for herself, we show that ratios polynomial in 1/\alpha cannot be avoided. If the utilities of P and A for each option are related by a factor \beta, we obtain an \Omega(1 / \log \beta)-approximation, and O(\log \log \beta / \log \beta) is best possible.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Mechanism design
Agent-based and Multi-agent Systems -> MAS: Agent communication
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
1758
Progressive Label Propagation for Semi-Supervised Multi-Dimensional Classification
Teng Huang, Bin-Bin Jia, Min-Ling Zhang
[+] More
[-] Less
In multi-dimensional classification (MDC), each training example is associated with multiple class variables from different class spaces. However, it is rather costly to collect labeled MDC examples which have to be annotated from several dimensions (class spaces). To reduce the labeling cost, we attempt to deal with the MDC problem under the semi-supervised learning setting. Accordingly, a novel MDC approach named PLAP is proposed to solve the resulting semi-supervised MDC problem. Overall, PLAP works under the label propagation framework to utilize unlabeled data. To further consider dependencies among class spaces, PLAP deals with each class space in a progressive manner, where the previous propagation results will be used to initialize the current propagation procedure and all processed class spaces and the current one will be regarded as an entirety. Experiments validate the effectiveness of the proposed approach.
List of keywords
Machine Learning -> ML: Classification
Machine Learning -> ML: Multi-label
Machine Learning -> ML: Semi-supervised learning
1762
Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training
Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzhi Li, Pheng-Ann Heng
[+] More
[-] Less
Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for both 2D and 3D computer vision. However, existing MAE-style methods can only learn from the data of a single modality, i.e., either images or point clouds, which neglect the implicit semantic and geometric correlation between 2D and 3D. In this paper, we explore how the 2D modality can benefit 3D masked autoencoding, and propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training. Joint-MAE randomly masks an input 3D point cloud and its projected 2D images, and then reconstructs the masked information of the two modalities. For better cross-modal interaction, we construct our JointMAE by two hierarchical 2D-3D embedding modules, a joint encoder, and a joint decoder with modal-shared and model-specific decoders. On top of this, we further introduce two cross-modal strategies to boost the 3D representation learning, which are local-aligned attention mechanisms for 2D-3D semantic cues, and a cross-reconstruction loss for 2D-3D geometric constraints. By our pre-training paradigm, Joint-MAE achieves superior performance on multiple downstream tasks, e.g., 92.4% accuracy for linear SVM on ModelNet40 and 86.07% accuracy on the hardest split of ScanObjectNN.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Representation learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
1763
A New Variable Ordering for In-processing Bounded Variable Elimination in SAT Solvers
Shuolin Li, Chu-Min Li, Jordi Coll, Mao Luo, Djamal Habet, Felip Manya
[+] More
[-] Less
Bounded Variable Elimination (BVE) is an important Boolean formula simplification technique in which the variable ordering is crucial. We define a new variable ordering based on variable activity, called ESA (variable Elimination Scheduled by Activity), for in-processing BVE in Conflict-Driven Clause Learning (CDCL) SAT solvers, and incorporate it in several state-of-the-art CDCL SAT solvers. Experimental results show that the new ESA ordering consistently makes these solvers solve more instances on the benchmark set including all instances used in the crafted, application and main tracks of all SAT Competitions up to 2022. The behaviour of ESA and the reason of its effectiveness are also analyzed.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Constraint Satisfaction and Optimization -> CSO: Solvers and tools
1778
Learning to Self-Reconfigure for Freeform Modular Robots via Altruism Proximal Policy Optimization
Lei Wu, Bin Guo, Qiuyun Zhang, zhuo sun, Jieyi Zhang, Zhiwen Yu
[+] More
[-] Less
The advantages of modular robot systems stem from their ability to change between different configurations, which enables them to adapt to complex and dynamic real-world environments. Then, how to perform the accurate and efficient change of the modular robot system, i.e., self-reconfiguration problem is essential. Existing reconfiguration algorithms are based on discrete motion primitives and are suitable for the lattice type modular robots. For the freeform modular robots, the modules are connected without alignment and the motion space is continuous. It makes the existing reconfiguration methods infeasible. In this work, for the freeform modular robots, we design a parallel distributed self-reconfiguration algorithm based on multi-agent reinforcement learning to realize the automatic design of conflict-free reconfiguration controllers in continuous action spaces. We introduce a collaborative mechanism into the reinforcement learning to avoid conflicts. Furthermore, we design the distributed termination criteria to achieve timely termination under the condition of local observability and limited communication. Simulations show that the efficiency and congruence are improved and the module movement show altruism in the proposed method, compared to the baselines.
List of keywords
Robotics -> ROB: Learning in robotics
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Robotics -> ROB: Multi-robot systems
1793
Proportionally Fair Online Allocation of Public Goods with Predictions
Siddhartha Banerjee, Safwan Hossain, Vasilis Gkatzelis, Billy Jin, Evi Micha, Nisarg Shah
[+] More
[-] Less
We design online algorithms for fair allocation of public goods to a set of $N$ agents over a sequence of $T$ rounds and focus on improving their performance using predictions. In the basic model, a public good arrives in each round, and every agent reveals their value for it upon arrival. The algorithm must irrevocably decide the investment in this good without exceeding a total budget of $B$ across all rounds. The algorithm can utilize (potentially noisy) predictions of each agent’s total value for all remaining goods. The algorithm’s performance is measured using a \emph{proportional fairness} objective, which informally demands that every group of agents be rewarded proportional to its size and the cohesiveness of its preferences. We show that no algorithm can achieve better than $\Theta(T/B)$ proportional fairness without predictions. With reasonably accurate predictions, the situation improves significantly, and $\Theta(\log (T/B))$ proportional fairness is achieved. We also extend our results to a general setting wherein a batch of $L$ public goods arrive in each round and $O(\log (\min(N,L) \cdot T/B))$ proportional fairness is achieved. Our exact bounds are parameterized as a function of the prediction error, with performance degrading gracefully with increasing errors.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Fair division
1796
Basket Representation Learning by Hypergraph Convolution on Repeated Items for Next-basket Recommendation
Yalin Yu, Enneng Yang, Guibing Guo, Linying Jiang, Xingwei Wang
[+] More
[-] Less
Basket representation plays an important role in the task of next-basket recommendation. However, existing methods generally adopts pooling operations to learn a basket’s representation, from which two critical issues can be identified. First, they treat a basket as a set of items independent and identically distributed. We find that items occurring in the same basket have much higher correlations than those randomly selected by conducting data analysis on a real dataset. Second, although some works have recognized the importance of items repeatedly purchased in multiple baskets, they ignore the correlations among the repeated items in a same basket, whose importance is shown by our data analysis. In this paper, we propose a novel Basket Representation Learning (BRL) model by leveraging the correlations among intra-basket items. Specifically, we first connect all the items (in a basket) as a hyperedge, where the correlations among different items can be well exploited by hypergraph convolution operations. Meanwhile, we also connect all the repeated items in the same basket as a hyperedge, whereby their correlations can be further strengthened. We generate a negative (positive) view of the basket by data augmentation on repeated (non-repeated) items, and apply contrastive learning to force more agreements on repeated items. Finally, experimental results on three real datasets show that our approach performs better than eight baselines in ranking accuracy.
List of keywords
Data Mining -> DM: Recommender systems
Data Mining -> DM: Information retrieval
1798
Multi-Modality Deep Network for JPEG Artifacts Reduction
Xuhao Jiang, Weimin Tan, Qing Lin, Chenxi Ma, Bo Yan, Liquan Shen
[+] More
[-] Less
In recent years, many convolutional neural network-based models are designed for JPEG artifacts reduction, and have achieved notable progress. However, few methods are suitable for extreme low-bitrate image compression artifacts reduction. The main challenge is that the highly compressed image loses too much information, resulting in reconstructing high-quality image difficultly. To address this issue, we propose a multimodal fusion learning method for text-guided JPEG artifacts reduction, in which the corresponding text description not only provides the potential prior information of the highly compressed image, but also serves as supplementary information to assist in image deblocking. We fuse image features and text semantic features from the global and local perspectives respectively, and design a contrastive loss built upon contrastive learning to produce visually pleasing results. Extensive experiments, including a user study, prove that our method can obtain better deblocking results compared to the state-of-the-art methods.
List of keywords
Machine Learning -> ML: Multi-modal learning
Computer Vision -> CV: Machine learning for vision
1805
Totally Dynamic Hypergraph Neural Networks
Peng Zhou, Zongqian Wu, Xiangxiang Zeng, Guoqiu Wen, Junbo Ma, Xiaofeng Zhu
[+] More
[-] Less
As an extension of graphs, hypergraphs can naturally represent multi-relationships and have great application prospects in real life. The static hypergraph neural network relies too much on the initialized hypergraph structure and cannot mine hidden relationships within the data; the dynamic hypergraph neural network optimizes the hypergraph structure in the process of model iteration and can mine more information. However, the existing dynamic hypergraph neural networks ignore the features of hyperedges and cannot adjust the number of hyperedges, which proposes limitations when adjusting hypergraphs. We propose a novel hypergraph neural network that can adjust the number of hyperedges while optimizing the hypergraph structure. Our method focuses on hyperedge features and learns their feature distribution rather than fixed hyperedge features. The hyperedge is obtained by sampling from the learned distribution, and then the hypergraph is constructed according to the attention coefficient of sampled hyperedges and nodes, and finally, the node features are updated using the hypergraph convolution algorithm. Experimental results demonstrate the effectiveness of our method.
List of keywords
Data Mining -> DM: Mining graphs
Data Mining -> DM: Networks
1809
Expanding the Hyperbolic Kernels: A Curvature-aware Isometric Embedding View
Meimei Yang, Pengfei Fang, Hui Xue
[+] More
[-] Less
Modeling data relation as a hierarchical structure has proven beneficial for many learning scenarios, and the hyperbolic space, with negative curvature, can encode such data hierarchy without distortion. Several recent studies also show that the representation power of the hyperbolic space can be further improved by endowing the kernel methods. Unfortunately, the known kernel methods, developed in hyperbolic space, are limited by the adaptation capacity or distortion issues. This paper addresses the issues through a novel embedding function. To this end, we propose a curvature-aware isometric embedding, which establishes an isometry from the Poincar\’e model to a special reproducing kernel Hilbert space (RKHS). Then we can further define a series of kernels on this RKHS, including several positive definite kernels and an indefinite kernel. Thorough experiments are conducted to demonstrate the superiority of our proposals over existing-known hyperbolic and Euclidean kernels in various learning tasks, e.g., zero-shot learning and graph learning.
List of keywords
Machine Learning -> ML: Kernel methods
Machine Learning -> ML: Geometric learning
1812
Quantifying Harm
Sander Beckers, Hana Chockler, Joseph Halpern
[+] More
[-] Less
In [Beckers et. al. 2022] a qualitative notion of harm is defined: either harm is caused, or it is not. For practical applications, we often need to quantify harm; for example, we may want to choose the least harmful of a set of possible interventions. We first present a quantitative definition of harm in a deterministic context involving a single individual, then we consider the issues involved in dealing with uncertainty regarding the context and going from a notion of harm for a single individual to a notion of “societal harm”, which involves aggregating the harm to individuals. We show that the “obvious” way of doing this (just taking the expected harm for an individual and then summing the expected harm over all individuals) can lead to counterintuitive or inappropriate answers, and discuss alternatives, drawing on work from the decision-theory literature.
List of keywords
AI Ethics, Trust, Fairness -> ETF: Ethical, legal and societal issues
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
Uncertainty in AI -> UAI: Decision and utility theory
1813
A Dual Semantic-Aware Recurrent Global-Adaptive Network for Vision-and-Language Navigation
Liuyi Wang, Zongtao He, Jiagui Tang, Ronghao Dang, Naijia Wang, Chengju Liu, Qijun Chen
[+] More
[-] Less
Vision-and-Language Navigation (VLN) is a realistic but challenging task that requires an agent to locate the target region using verbal and visual cues. While significant advancements have been achieved recently, there are still two broad limitations: (1) The explicit information mining for significant guiding semantics concealed in both vision and language is still under-explored; (2) The previously structured map method provides the average historical appearance of visited nodes, while it ignores distinctive contributions of various images and potent information retention in the reasoning process. This work proposes a dual semantic-aware recurrent global-adaptive network (DSRG) to address the above problems. First, DSRG proposes an instruction-guidance linguistic module (IGL) and an appearance-semantics visual module (ASV) for boosting vision and language semantic learning respectively. For the memory mechanism, a global adaptive aggregation module (GAA) is devised for explicit panoramic observation fusion, and a recurrent memory fusion module (RMF) is introduced to supply implicit temporal hidden states. Extensive experimental results on the R2R and REVERIE datasets demonstrate that our method achieves better performance than existing methods. Code is available at https://github.com/CrystalSixone/DSRG.
List of keywords
Computer Vision -> CV: Vision and language 
Computer Vision -> CV: Scene analysis and understanding   
Computer Vision -> CV: Structural and model-based approaches, knowledge representation and reasoning
1816
Cardinality-Minimal Explanations for Monotonic Neural Networks
Ouns El Harzli, Bernardo Cuenca Grau, Ian Horrocks
[+] More
[-] Less
In recent years, there has been increasing interest in explanation methods for neural model predictions that offer precise formal guarantees. These include abductive (respectively, contrastive) methods, which aim to compute minimal subsets of input features that are sufficient for a given prediction to hold (respectively, to change a given prediction). The corresponding decision problems are, however, known to be intractable. In this paper, we investigate whether tractability can be regained by focusing on neural models implementing a monotonic function. Although the relevant decision problems remain intractable, we can show that they become solvable in polynomial time by means of greedy algorithms if we additionally assume that the activation functions are continuous everywhere and differentiable almost everywhere. Our experiments suggest favourable performance of our algorithms.
List of keywords
Machine Learning -> ML: Explainable/Interpretable machine learning
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Theory of deep learning
1817
Explainable Multi-Agent Reinforcement Learning for Temporal Queries
Kayla Boggess, Sarit Kraus, Lu Feng
[+] More
[-] Less
As multi-agent reinforcement learning (MARL) systems are increasingly deployed throughout society, it is imperative yet challenging for users to understand the emergent behaviors of MARL agents in complex environments. This work presents an approach for generating policy-level contrastive explanations for MARL to answer a temporal user query, which specifies a sequence of tasks completed by agents with possible cooperation. The proposed approach encodes the temporal query as a PCTL* logic formula and checks if the query is feasible under a given MARL policy via probabilistic model checking. Such explanations can help reconcile discrepancies between the actual and anticipated multi-agent behaviors. The proposed approach also generates correct and complete explanations to pinpoint reasons that make a user query infeasible. We have successfully applied the proposed approach to four benchmark MARL domains (up to 9 agents in one domain). Moreover, the results of a user study show that the generated explanations significantly improve user performance and satisfaction.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Human-agent interaction
1826
Levin Tree Search with Context Models
Laurent Orseau, Levi Lelis, Marcus Hutter
[+] More
[-] Less
Levin Tree Search (LTS) is a search algorithm that makes use of a policy (a probability distribution over actions) and comes with a theoretical guarantee on the number of expansions before reaching a goal node, depending on the quality of the policy. This guarantee can be used as a loss function, which we call the LTS loss, to optimize neural networks representing the policy (LTS+NN). In this work we show that the neural network can be substituted with parameterized context models originating from the online compression literature (LTS+CM). We show that the LTS loss is convex under this new model, which allows for using standard convex optimization tools, and obtain convergence guarantees to the optimal parameters in an online setting for a given set of solution trajectories — guarantees that cannot be provided for neural networks. The new LTS+CM algorithm compares favorably against LTS+NN on several benchmarks: Sokoban (Boxoban), The Witness, and the 24-Sliding Tile puzzle (STP). The difference is particularly large on STP, where LTS+NN fails to solve most of the test instances while LTS+CM solves each test instance in a fraction of a second. Furthermore, we show that LTS+CM is able to learn a policy that solves the Rubik’s cube in only a few hundred expansions, which considerably improves upon previous machine learning techniques.
List of keywords
Search -> S: Search and machine learning
Search -> S: Heuristic search
1829
Multi-Task Learning via Time-Aware Neural ODE
Feiyang YE, Xuehao Wang, Yu Zhang, Ivor Tsang
[+] More
[-] Less
Multi-Task Learning (MTL) is a well-established paradigm for learning shared models for a diverse set of tasks. Moreover, MTL improves data efficiency by jointly training all tasks simultaneously. However, directly optimizing the losses of all the tasks may lead to imbalanced performance on all the tasks due to the competition among tasks for the shared parameters in MTL models. Many MTL methods try to mitigate this problem by dynamically weighting task losses or manipulating task gradients. Different from existing studies, in this paper, we propose a Neural Ordinal diffeRential equation based Multi-tAsk Learning (NORMAL) method to alleviate this issue by modeling task-specific feature transformations from the perspective of dynamic flows built on the Neural Ordinary Differential Equation (NODE). Specifically, the proposed NORMAL model designs a time-aware neural ODE block to learn task-specific time information, which determines task positions of feature transformations in the dynamic flow, in NODE automatically via gradient descent methods. In this way, the proposed NORMAL model handles the problem of competing shared parameters by learning task positions. Moreover, the learned task positions can be used to evaluate the relevance of different tasks. Extensive experiments show that the proposed NORMAL model outperforms state-of-the-art MTL models.
List of keywords
Machine Learning -> ML: Multi-task and transfer learning
1830
New Bounds and Constraint Programming Models for the Weighted Vertex Coloring Problem
Olivier Goudet, Cyril Grelier, David Lesaint
[+] More
[-] Less
This paper addresses the weighted vertex coloring problem (WVCP) which is an NP-hard variant of the graph coloring problem with various applications. Given a vertex-weighted graph, the problem consists of partitioning vertices in independent sets (colors) so as to minimize the sum of the maximum weights of the colors. We first present an iterative procedure to reduce the size of WVCP instances and prove new upper bounds on the objective value and the number of colors needed to construct optimal solutions. Alternative constraint programming models are then introduced which rely on primal and dual encodings of the problem and use symmetry-breaking constraints. A large number of experiments are conducted on benchmark instances. We analyze the impact of using specific bounds to reduce the search space and speed up the exact resolution of instances. New optimality proofs are reported for some benchmark instances.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Constraint programming
Search -> S: Combinatorial search and optimisation
1831
Norm Deviation in Multiagent Systems: A Foundation for Responsible
Amika Singh, Munindar Singh
[+] More
[-] Less
The power of norms in both human societies and sociotechnical systems arises from the facts that (1) norms characterize acceptable behavior in high-level terms and (2) they are not hard controls and can be deviated from. Thus, the design of autonomous agents faces an essential tension: these agents must both (1) respect applicable societal norms, including laws and policies, and (2) deviate from those norms when blindly following them may lead to diminished outcomes. We propose a conceptual foundation for norm deviation. As a guiding framework, we adopt Habermas’s theory of communicative action comprising objective, subjective, and practical validity claims regarding the suitability of such deviation. Our analysis thus goes beyond previous studies of norm deviation and yields principles uniting norms and values by which to develop effective autonomous agents.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Normative systems
1833
Probabilistic Masked Attention Networks for Explainable Sequential Recommendation
Huiyuan Chen, Kaixiong Zhou, Zhimeng Jiang, Michael Yeh, Xiaoting Li, Menghai Pan, Yan Zheng, Xia Hu, Hao Yang
[+] More
[-] Less
The recently proposed Transformer-based models are highly powerful for modeling temporal dynamics of user preference in sequential recommendation. Most of variants adopt the Softmax transformation in the self-attention layers to generate dense attention probabilities. However, real-world item sequences are often noisy, containing a mixture of true-positive and false-positive interactions. Such dense attentions inevitably assign probability mass to noisy or irrelevant items, leading to sub-optimal performance and poor explainability. To tackle these issues, we propose a Probabilistic Masked Attention Network (PMAN) to identify the sparse pattern of attentions, which is more desirable for pruning noisy items in sequential recommendation. Specifically, we employ a probabilistic mask to achieve sparse attentions under a constrained optimization framework. As such, PMAN allows to select which information is critical to be retained or dropped in a data-driven fashion. Experimental studies on real-world benchmark datasets show that PMAN is able to improve the performance of Transformers significantly, and the performance gain becomes larger for more noisy sequences. Our code and data are available in: \href{https://anonymous.4open.science/r/PMAN_Rec-E72E}{{https://anonymous.4open.science/r/PMAN\_Rec-E72E}}.
List of keywords
Data Mining -> DM: Collaborative filtering
Data Mining -> DM: Information retrieval
1834
A Unifying Formal Approach to Importance Values in Boolean Functions
Hans Harder, Clemens Dubslaff, Christel Baier, Simon Jantsch
[+] More
[-] Less
Boolean functions and their representation through logics, circuits, AI classifiers, or binary decision diagrams (BDDs) play a central role in the design and analysis of computing systems. Quantifying the relative impact of variables on the truth value by means of importance values can provide useful insights to steer system design and debugging. In this paper, we introduce a uniform framework for reasoning about importance values by a generic notion of importance value functions (IVFs). IVFs are identified by a set of axioms that are motivated from several notions of importance values introduced in the literature, including Ben-Or and Linial’s influence and Chockler, Halpern, and Kupferman’s notion of responsibility and blame. We establish a connection of IVFs to game-theoretic concepts such as Shapley and Banzhaf values that measure the impact of players on outcomes in coalition games. Exploiting BDD-based symbolic methods and projected model counting, we devise and evaluate practical computation schemes for IVFs.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Cooperative games
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
1836
Learning constraint networks over unknown constraint languages
Christian Bessiere, Clement Carbonnel, Areski Himeur
[+] More
[-] Less
Constraint acquisition is the task of learning a constraint network from examples of solutions and non-solutions. Existing constraint acquisition systems typically require advance knowledge of the target network’s constraint language, which significantly narrows their scope of applicability. In this paper we propose a constraint acquisition method that computes a suitable constraint language as part of the learning process, eliminating the need for any advance knowledge. We report preliminary experiments on various acquisition benchmarks.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Constraint learning and acquisition
Constraint Satisfaction and Optimization -> CSO: Constraint programming
1838
On Translations between ML Models for XAI Purposes
Alexis de Colnet, Pierre Marquis
[+] More
[-] Less
In this paper, the succinctness of various ML models is studied. To be more precise, the existence of polynomial-time and polynomial-space translations between representation languages for classifiers is investigated. The languages that are considered include decision trees, random forests, several types of boosted trees, binary neural networks, Boolean multilayer perceptrons, and various logical representations of binary classifiers. We provide a complete map indicating for every pair of languages C, C’ whether or not a polynomial-time / polynomial-space translation exists from C to C’. We also explain how to take advantage of the resulting map for XAI purposes.
List of keywords
Knowledge Representation and Reasoning -> KRR: Knowledge compilation
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
1842
Game Theory with Simulation of Other Players
Vojtech Kovarik, Caspar Oesterheld, Vincent Conitzer
[+] More
[-] Less
Game-theoretic interactions with AI agents could differ from traditional human-human interactions in various ways. One such difference is that it may be possible to simulate an AI agent (for example because its source code is known), which allows others to accurately predict the agent’s actions. This could lower the bar for trust and cooperation. In this paper, we first formally define games in which one player can simulate another at a cost, and derive some basic properties of such games. Then, we prove a number of results for such games, including: (1) introducing simulation into generic-payoff normal-form games makes them easier to solve; (2) if the only obstacle to cooperation is a lack of trust in the possibly-simulated agent, simulation enables equilibria that improve the outcome for both agents; and (3) however, there are settings where introducing simulation results in strictly worse outcomes for both players.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
1846
Gapformer: Graph Transformer with Graph Pooling for Node Classification
Chuang Liu, Yibing Zhan, Xueqi Ma, Liang Ding, Dapeng Tao, Jia Wu, Wenbin Hu
[+] More
[-] Less
Graph Transformers (GTs) have proved their advantage in graph-level tasks. However, existing GTs still perform unsatisfactorily on the node classification task due to 1) the overwhelming unrelated information obtained from a vast number of irrelevant distant nodes and 2) the quadratic complexity regarding the number of nodes via the fully connected attention mechanism. In this paper, we present Gapformer, a method for node classification that deeply incorporates Graph Transformer with Graph Pooling. More specifically, Gapformer coarsens the large-scale nodes of a graph into a smaller number of pooling nodes via local or global graph pooling methods, and then computes the attention solely with the pooling nodes rather than all other nodes. In such a manner, the negative influence of the overwhelming unrelated nodes is mitigated while maintaining the long-range information, and the quadratic complexity is reduced to linear complexity with respect to the fixed number of pooling nodes. Extensive experiments on 13 node classification datasets, including homophilic and heterophilic graph datasets, demonstrate the competitive performance of Gapformer over existing Graph Neural Networks and GTs.
List of keywords
Data Mining -> DM: Mining graphs
Data Mining -> DM: Networks
1850
MultiPar-T: Multiparty-Transformer for Capturing Contingent Behaviors in Group Conversations
Dong Won Lee, Yubin Kim, Rosalind Picard, Cynthia Breazeal, Hae Won Park
[+] More
[-] Less
As we move closer to real-world AI systems, AI agents must be able to deal with multiparty (group) conversations. Recognizing and interpreting multiparty behaviors is challenging, as the system must recognize individual behavioral cues, deal with the complexity of multiple streams of data from multiple people, and recognize the subtle contingent social exchanges that take place amongst group members. To tackle this challenge, we propose the Multiparty-Transformer (Multipar-T), a transformer model for multiparty behavior modeling. The core component of our proposed approach is the Crossperson Attention, which is specifically designed to detect contingent behavior between pairs of people. We verify the effectiveness of Multipar-T on a publicly available video-based group engagement detection benchmark, where it outperforms state-of-the-art approaches in average F-1 scores by 5.2% and individual class F-1 scores by up to 10.0%. Through qualitative analysis, we show that our Crossperson Attention module is able to discover contingent behavior.
List of keywords
Machine Learning -> ML: Attention models
Computer Vision -> CV: Video analysis and understanding   
Humans and AI -> HAI: Computer-aided education
1856
Towards a Better Understanding of Learning with Multiagent Teams
David Radke, Kyle Tilbury, Kate Larson, Tim Brecht
[+] More
[-] Less
While it has long been recognized that a team of individual learning agents can be greater than the sum of its parts, recent work has shown that larger teams are not necessarily more effective than smaller ones. In this paper, we study why and under which conditions certain team structures promote effective learning for a population of individual learning agents. We show that, depending on the environment, some team structures help agents learn to specialize into specific roles, resulting in more favorable global results. However, large teams create credit assignment challenges that reduce coordination, leading to large teams performing poorly compared to smaller ones. We support our conclusions with both theoretical analysis and empirical results.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
1868
Anticipatory Fictitious Play
Alex Cloud
[+] More
[-] Less
Fictitious play is an algorithm for computing Nash equilibria of matrix games. Recently, machine learning variants of fictitious play have been successfully applied to complicated real-world games. This paper presents a simple modification of fictitious play which is a strict improvement over the original: it has the same theoretical worst-case convergence rate, is equally applicable in a machine learning context, and enjoys superior empirical performance. We conduct an extensive comparison of our algorithm with fictitious play, proving an optimal $O(t^{-1})$ convergence rate for certain classes of games, demonstrating superior performance numerically across a variety of games, and concluding with experiments that extend these algorithms to the setting of deep multiagent reinforcement learning.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Machine Learning -> ML: Reinforcement learning
1875
Advancing Post-Hoc Case-Based Explanation with Feature Highlighting
Eoin Kenny, Eoin Delaney, Mark T. Keane
[+] More
[-] Less
Explainable AI (XAI) has been proposed as a valu- able tool to assist in downstream tasks involving human-AI collaboration. Perhaps the most psy- chologically valid XAI techniques are case-based approaches which display “whole” exemplars to explain the predictions of black-box AI systems. However, for such post-hoc XAI methods dealing with images, there has been no attempt to improve their scope by using multiple clear feature “parts” of the images to explain the predictions while link- ing back to relevant cases in the training data, thus allowing for more comprehensive explanations that are faithful to the underlying model. In this work, we address this gap by proposing two general al- gorithms (latent and superpixel-based) which can isolate multiple clear feature “parts” in a test im- age, and then connect them to the explanatory cases found in the training data, before testing their effec- tiveness in a carefully designed user study. Results demonstrate that the proposed algorithms appropri- ately calibrate a user’s feelings of correctness for ambiguous classifications in real world data on the ImageNet dataset.
List of keywords
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Knowledge Representation and Reasoning -> KRR: Case-based reasoning
1883
On the Paradox of Learning to Reason from Data
Honghua Zhang, Liunian Harold Li, Tao Meng, Kai-Wei Chang, Guy Van den Broeck
[+] More
[-] Less
Logical reasoning is needed in a wide range of NLP tasks. Can a BERT model be trained end-to-end to solve logical reasoning problems presented in natural language? We attempt to answer this question in a confined problem space where there exists a set of parameters that perfectly simulates logical reasoning. We make observations that seem to contradict each other: BERT attains near-perfect accuracy on in-distribution test examples while failing to generalize to other data distributions over the exact same problem space. Our study provides an explanation for this paradox: instead of learning to emulate the correct reasoning function, BERT has, in fact, learned statistical features that inherently exist in logical reasoning problems. We also show that it is infeasible to jointly remove statistical features from data, illustrating the difficulty of learning to reason in general. Our result naturally extends to other neural models (e.g. T5) and unveils the fundamental difference between learning to reason and learning to achieve high performance on NLP benchmarks using statistical features.
List of keywords
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
1884
Improve Video Representation with Temporal Adversarial Augmentation
Jinhao Duan, Quanfu Fan, Hao Cheng, Xiaoshuang Shi, Kaidi Xu
[+] More
[-] Less
Recent works reveal that adversarial augmentation benefits the generalization of neural networks (NNs) if used in an appropriate manner. In this paper, we introduce Temporal Adversarial Augmentation (TA), a novel video augmentation technique that utilizes temporal attention. Unlike conventional adversarial augmentation, TA is specifically designed to shift the attention distributions of neural networks with respect to video clips by maximizing a temporal-related loss function. We demonstrate that videos augmented by TA will obtain diverse temporal views, which significantly impact the focus of neural networks. Training with these examples remedies the flaw of unbalanced temporal information perception and enhances the ability to defend against temporal shifts, ultimately leading to better generalization. To leverage TA, we propose Temporal Video Adversarial Fine-tuning (TAF) framework for improving video representations. TAF is a model-agnostic, generic, and interpretability-friendly training strategy. We evaluate TAF with four powerful models (TSM, GST, TAM, and TPN) over three challenging temporal-related benchmarks (Something-something V1&V2 and diving48). Experimental results demonstrate that TAF effectively improves the test accuracy of these models with notable margins, e.g., training TSM with TAF achieves consistent improvements on Something-something V1(+1.3%) and V2(+0.9%), without introducing additional parameters or computational costs. As a byproduct, TAF also improves the robustness under out-of-distribution (OOD) settings.
List of keywords
Computer Vision -> CV: Video analysis and understanding   
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Representation learning
1889
One Model, Any CSP: Graph Neural Networks as Fast Global Search Heuristics for Constraint Satisfaction
Jan Tönshoff, Berke Kisin, Jakob Lindner, Martin Grohe
[+] More
[-] Less
We propose a universal Graph Neural Network architecture which can be trained as an end-2-end search heuristic for any Constraint Satisfaction Problem (CSP). Our architecture can be trained unsupervised with policy gradient descent to generate problem specific heuristics for any CSP in a purely data driven manner. The approach is based on a novel graph representation for CSPs that is both generic and compact and enables us to process every possible CSP instance with one GNN, regardless of constraint arity, relations or domain size. Unlike previous RL-based methods, we operate on a global search action space and allow our GNN to modify any number of variables in every step of the stochastic search. This enables our method to properly leverage the inherent parallelism of GNNs. We perform a thorough empirical evaluation where we learn heuristics for well known and important CSPs, both decision and optimisation problems, from random data, including graph coloring, MAXCUT, and MAX-k-SAT, and the general RB model. Our approach significantly outperforms prior end-2-end approaches for neural combinatorial optimization. It can compete with conventional heuristics and solvers on test instances that are several orders of magnitude larger and structurally more complex than those seen during training.
List of keywords
Machine Learning -> ML: Reinforcement learning
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Machine Learning -> ML: Sequence and graph learning
1918
Contrastive Learning for Sign Language Recognition and Translation
Shiwei Gan, Yafeng Yin, Zhiwei Jiang, Kang Xia, Lei Xie, Sanglu Lu
[+] More
[-] Less
There are two problems that widely exist in current end-to-end sign language processing architecture. One is the CTC spike phenomenon which weakens the visual representational ability in Continuous Sign Language Recognition (CSLR). The other one is the exposure bias problem which leads to the accumulation of translation errors during inference in Sign Language Translation (SLT). In this paper, we tackle these issues by introducing contrast learning, aiming to enhance both visual-level feature representation and semantic-level error tolerance. Specifically, to alleviate CTC spike phenomenon and enhance visual-level representation, we design a visual contrastive loss by minimizing visual feature distance between different augmented samples of frames in one sign video, so that the model can further explore features by utilizing numerous unlabeled frames in an unsupervised way. To alleviate exposure bias problem and improve semantic-level error tolerance, we design a semantic contrastive loss by re-inputting the predicted sentence into semantic module and comparing features of ground-truth sequence and predicted sequence, for exposing model to its own mistakes. Besides, we propose two new metrics, i.e., Blank Rate and Consecutive Wrong Word Rate to directly reflect our improvement on the two problems. Extensive experimental results on current sign language datasets demonstrate the effectiveness of our approach, which achieves state-of-the-art performance.
List of keywords
Computer Vision -> CV: Vision and language 
Computer Vision -> CV: Action and behavior recognition
1920
Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue
Cristian-Paul BARA, Ziqiao Ma, Yingzhuo Yu, Julie Shah, Joyce Chai
[+] More
[-] Less
Collaborative tasks often begin with partial task knowledge and incomplete plans from each partner. To complete these tasks, partners need to engage in situated communication with their partners and coordinate their partial plans towards a complete plan to achieve a joint task goal. While such collaboration seems effortless in a human-human team, it is highly challenging for human-AI collaboration. To address this limitation, this paper takes a step towards Collaborative Plan Acquisition, where humans and agents strive to learn and communicate with each other to acquire a complete plan for joint tasks. Specifically, we formulate a novel problem for agents to predict the missing task knowledge for themselves and for their partners based on rich perceptual and dialogue history. We extend a situated dialogue benchmark for symmetric collaborative tasks in a 3D blocks world and investigate computational strategies for plan acquisition. Our empirical results suggest that predicting the partner’s missing knowledge is a more viable approach than predicting one’s own. We show that explicit modeling of the partner’s dialogue moves and mental states produces improved and more stable results than without. These results provide insight for future AI agents that can predict what knowledge their partner is missing and, therefore, can proactively communicate such information to help the partner acquire such missing knowledge toward a common understanding of joint tasks.
List of keywords
Humans and AI -> HAI: Human-AI collaboration
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
1927
CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization
Xiaohan Yu, Jun Wang, Yongsheng Gao
[+] More
[-] Less
Ultra-fine-grained visual classification (ultra-FGVC) targets at classifying sub-grained categories of fine-grained objects. This inevitably requires discriminative representation learning within a limited training set. Exploring intrinsic features from the object itself, e.g., predicting the rotation of a given image, has demonstrated great progress towards learning discriminative representation. Yet none of these works consider explicit supervision for learning mutual information at instance level. To this end, this paper introduces CLE-ViT, a novel contrastive learning encoded transformer, to address the fundamental problem in ultra-FGVC. The core design is a self-supervised module that performs self-shuffling and masking and then distinguishes these altered images from other images. This drives the model to learn an optimized feature space that has a large inter-class distance while remaining tolerant to intra-class variations. By incorporating this self-supervised module, the network acquires more knowledge from the intrinsic structure of the input data, which improves the generalization ability without requiring extra manual annotations. CLE-ViT demonstrates strong performance on 7 publicly available datasets, demonstrating its effectiveness in the ultra-FGVC task. The code is available at https://github.com/Markin-Wang/CLEViT.
List of keywords
Machine Learning -> ML: Classification
Machine Learning -> ML: Applications
1932
Incorporating Unlikely Negative Cues for Distinctive Image Captioning
Zhengcong Fei, Junshi Huang
[+] More
[-] Less
Recent neural image captioning models have achieved promising results on some automatic metrics, yet suffer badly from the generic sentence problem, limiting their applications to a few toy scenarios. An interesting approach, namely negative training, has been proposed to remind the model not to generate a high-frequency while meaningless sentence. However, its usability in image captioning is hindered by one issue, only considering frequency perspective will ignore the low-frequency but generic and vague sentences, especially facing diversified visual scenes. In this paper, we propose to incorporate unlikely \emph{negative} knowledge into image captioning, to keep the model away from undesirable generic descriptions while avoiding the above problems. Specifically, we first train a negative teacher model that can produce image-wise generic sentences with retrieval entropy-filtered data, and then the student model is required to maximize the distance with multi-level negative knowledge transferring. Empirical results on the MS COCO benchmark verify that our plug-and-play unlikely negative framework shows a significant performance gain in both accuracy and diversity, compared to previous state-of-the-art distinctive image captioning methods.
List of keywords
Computer Vision -> CV: Vision and language 
Machine Learning -> ML: Learning preferences or rankings
1938
RAIN: RegulArization on Input and Network for Black-Box Domain Adaptation
Qucheng Peng, Zhengming Ding, Lingjuan Lyu, Lichao Sun, Chen Chen
[+] More
[-] Less
Source-Free domain adaptation transits the source-trained model towards target domain without exposing the source data, trying to dispel these concerns about data privacy and security. However, this paradigm is still at risk of data leakage due to adversarial attacks on the source model. Hence, the Black-Box setting only allows to use the outputs of source model, but still suffers from overfitting on the source domain more severely due to source model’s unseen weights. In this paper, we propose a novel approach named RAIN (RegulArization on Input and Network) for Black-Box domain adaptation from both input-level and network-level regularization. For the input-level, we design a new data augmentation technique as Phase MixUp, which highlights task-relevant objects in the interpolations, thus enhancing input-level regularization and class consistency for target models. For network-level, we develop a Subnetwork Distillation mechanism to transfer knowledge from the target subnetwork to the full target network via knowledge distillation, which thus alleviates overfitting on the source domain by learning diverse target representations. Extensive experiments show that our method achieves state-of-the-art performance on several cross-domain benchmarks under both single- and multi-source black-box domain adaptation.
List of keywords
Machine Learning -> ML: Multi-task and transfer learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
1953
Tracking Different Ant Species: An Unsupervised Domain Adaptation Framework and a Dataset for Multi-object Tracking
Chamath Abeysinghe, Chris Reid, Hamid Rezatofighi, Bernd Meyer
[+] More
[-] Less
Tracking individuals is a vital part of many experiments conducted to understand collective behaviour. Ants are the paradigmatic model system for such experiments but their lack of individually distinguishing visual features and their high colony densities make it extremely difficult to perform reliable tracking automatically. Additionally, the wide diversity of their species’ appearances makes a generalized approach even harder. In this paper, we propose a data-driven multi-object tracker that, for the first time, employs domain adaptation to achieve the required generalisation. This approach is built upon a joint-detection-and-tracking framework that is extended by a set of domain discriminator modules integrating an adversarial training strategy in addition to the tracking loss. In addition to this novel domain-adaptive tracking framework, we present a new dataset and a benchmark for the ant tracking problem. The dataset contains 57 video sequences with full trajectory annotation, including 30k frames captured from two different ant species moving on different background patterns. It comprises 33 and 24 sequences for source and target domains, respectively. We compare our proposed framework against other domain-adaptive and non-domain-adaptive multi-object tracking baselines using this dataset and show that incorporating domain adaptation at multiple levels of the tracking pipeline yields significant improvements. The code and the dataset are available at https://blinded.
List of keywords
Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Applications
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
1957
Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies
Rubens Moraes, David Aleixo, Lucas Nascimento Ferreira, Levi Lelis
[+] More
[-] Less
This paper introduces Local Search (2L), a novel algorithm for providing a set of reference strategies to guide the search for programmatic strategies in two-player zero-sum games. Other learning methods, such as Iterated Best Response (IBR), Fictitious Play (FP), and Double-Oracle (DO), can be computationally expensive or miss important information for guiding search algorithms. 2L actively selects a set of reference strategies to improve the search signal. We empirically demonstrate the advantages of our approach while guiding a local search algorithm for synthesizing strategies in three games, including MicroRTS, a challenging real-time strategy game. Results show that 2L learns reference strategies that provide a stronger search signal than IBR, FP, and DO. We also simulate a tournament of MicroRTS, where a synthesizer using 2L outperformed the winners of the two latest MicroRTS competitions, which were programmatic strategies written by human programmers.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Computer games
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Search -> S: Local search
1966
Inferring Private Valuations from Behavioral Data in Bilateral Sequential Bargaining
Lvye Cui, Haoran Yu
[+] More
[-] Less
Inferring bargainers’ private valuations on items from their decisions is crucial for analyzing their strategic behaviors in bilateral sequential bargaining. Most existing approaches that infer agents’ private information from observable data either rely on strong equilibrium assumptions or require a careful design of agents’ behavior models. To overcome these weaknesses, we propose a Bayesian Learning-based Valuation Inference (BLUE) framework. Our key idea is to derive feasible intervals of bargainers’ private valuations from their behavior data, using the fact that most bargainers do not choose strictly dominated strategies. We leverage these feasible intervals to guide our inference. Specifically, we first model each bargainer’s behavior function (which maps his valuation and bargaining history to decisions) via a recurrent neural network. Second, we learn these behavior functions by utilizing a novel loss function defined based on feasible intervals. Third, we derive the posterior distributions of bargainers’ valuations according to their behavior data and learned behavior functions. Moreover, we account for the heterogeneity of bargainer behaviors, and propose a clustering algorithm (K-Loss) to improve the efficiency of learning these behaviors. Experiments on both synthetic and real bargaining data show that our inference approach outperforms baselines.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Game Theory and Economic Paradigms -> GTEP: Other
1983
XFormer: Fast and Accurate Monocular 3D Body Capture
Lihui Qian, Xintong Han, Faqiang Wang, Hongyu Liu, Haoye Dong, Zhiwen Li, Huawei Wei, zhe lin, Cheng-Bin Jin
[+] More
[-] Less
We present XFormer, a novel human mesh and motion capture method that achieves real-time performance on consumer CPUs given only monocular images as input. The proposed network architecture contains two branches: a keypoint branch that estimates 3D human mesh vertices given 2D keypoints, and an image branch that makes prediction directly from the RGB image features. At the core of our method is a cross-modal transformer block that allows information flow across these two branches by modeling the attention between 2D keypoint coordinates and image spatial features. Our architecture is smartly designed, which enables us to train on various types of datasets including images with 2D/3D annotations, images with 3D pseudo labels, and motion capture datasets that do not have associated images. This effectively improves the accuracy and generalization ability of our system. Built on a lightweight backbone (MobileNetV3), our method runs blazing fast (over 30fps on a single CPU core) and still yields competitive accuracy. Furthermore, with a HRNet backbone, XFormer delivers state-of-the-art performance on Huamn3.6 and 3DPW datasets.
List of keywords
Computer Vision -> CV: 3D computer vision
2002
A Novel Learnable Interpolation Approach for Scale-Arbitrary Image Super-Resolution
Jiahao Chao, Zhou Zhou, Hongfan Gao, Jiali Gong, Zhenbing Zeng, Zhengfeng Yang
[+] More
[-] Less
Deep convolutional neural networks (CNNs) have achieved unprecedented success in single image super-resolution over the past few years. Meanwhile, there is an increasing demand for single image super-resolution with arbitrary scale factors in real-world scenarios. Many approaches adopt scale-specific multi-path learning to cope with multi-scale super-resolution with a single network. However, these methods require a large number of parameters. To achieve a better balance between the reconstruction quality and parameter amounts, we proposes a learnable interpolation method that leverages the advantages of neural networks and interpolation methods to tackle the scale-arbitrary super-resolution task. The scale factor is treated as a function parameter for generating the kernel weights for the learnable interpolation. We demonstrate that the learnable interpolation builds a bridge between neural networks and traditional interpolation methods. Experiments show that the proposed learnable interpolation requires much fewer parameters and outperforms state-of-the-art super-resolution methods.
List of keywords
Computer Vision -> CV: Other
Machine Learning -> ML: Convolutional networks
2005
Linear Query Approximation Algorithms for Non-monotone Submodular Maximization under Knapsack Constraint
Canh Pham, Tan Tran, Dung Ha, My T. Thai
[+] More
[-] Less
This work, for the first time, introduces two constant factor approximation algorithms with linear query complexity for non-monotone submodular maximization over a ground set of size $n$ subject to a knapsack constraint, $\DLA$ and $\RLA$. $\DLA$ is a deterministic algorithm that provides an approximation factor of $6+\epsilon$ while $\RLA$ is a randomized algorithm with an approximation factor of $4+\epsilon$. Both run in $O(n \log(1/\epsilon)/\epsilon)$ query complexity. The key idea to obtain a constant approximation ratio with linear query lies in: (1) dividing the ground set into two appropriate subsets to find the near-optimal solution over these subsets with linear queries; and (2) combining a threshold greedy with properties of two disjoint sets or a random selection process to improve solution quality. In addition to the theoretical analysis, we have evaluated our proposed solutions with three applications: Revenue Maximization, Image Summarization, and Maximum Weighted Cut, showing that our algorithms not only return comparative results to state-of-the-art algorithms but also require significantly fewer queries.
List of keywords
Machine Learning -> ML: Optimization
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
2012
BARA: Efficient Incentive Mechanism with Online Reward Budget Allocation in Cross-Silo Federated Learning
Yunchao Yang, Yipeng Zhou, Miao Hu, Di Wu, Quan Z. Sheng
[+] More
[-] Less
Federated learning (FL) is a prospective distributed machine learning framework that can preserve data privacy. In particular, cross-silo FL can complete model training by making isolated data islands of different organizations collaborate with a parameter server (PS) via exchanging model parameters for multiple communication rounds. In cross-silo FL, an incentive mechanism is indispensable for motivating data owners to contribute their models to FL training. However, how to allocate the reward budget among different rounds is an essential but complicated problem largely overlooked by existing works. The challenge of this problem lies in the opaque feedback between reward budget allocation and model utility improvement of FL, making the optimal reward budget allocation complicated. To address this problem, we design an online reward budget allocation algorithm using Bayesian optimization named BARA (\underline{B}udget \underline{A}llocation for \underline{R}everse \underline{A}uction). Specifically, BARA can model the complicated relationship between reward budget allocation and final model accuracy in FL based on historical training records so that the reward budget allocated to each communication round is dynamically optimized so as to maximize the final model utility. We further incorporate the BARA algorithm into reverse auction-based incentive mechanisms to illustrate its effectiveness. Extensive experiments are conducted on real datasets to demonstrate that BARA significantly outperforms competitive baselines by improving model utility with the same amount of reward budget.
List of keywords
Machine Learning -> ML: Federated learning
2015
Reinforcement Learning Approaches for Traffic Signal Control under Missing Data
hao mei, Junxian Li, Bin Shi, Hua Wei
[+] More
[-] Less
The emergence of reinforcement learning (RL) methods in traffic signal control tasks have achieved better performance than conventional rule-based approaches. Most RL approaches require the observation of the environment for the agent to decide which action is optimal for a long-term reward. However, in real-world urban scenarios, missing observation of traffic states may frequently occur due to the lack of sensors, which makes existing RL methods inapplicable on road networks with missing observation. In this work, we aim to control the traffic signals in a real-world setting, where some of the intersections in the road network are not installed with sensors and thus with no direct observations around them. To the best of our knowledge, we are the first to tackle the traffic signal control problem in this real-world setting. Specifically, we propose two solutions: the first one imputes the traffic states to enable adaptive control, and the second one imputes both states and rewards to enable adaptive control and the training of RL agents. Through extensive experiments on both synthetic and real-world road network traffic, we reveal that our method outperforms conventional approaches and performs consistently with different missing rates. We also provide further investigations on how missing data influences the performance of our model.
List of keywords
Data Mining -> DM: Mining spatial and/or temporal data
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Reinforcement learning
2038
LION: Label Disambiguation for Semi-supervised Facial Expression Recognition with Progressive Negative Learning
Zhongjing Du, Xu Jiang, Peng Wang, Zhou Qizheng, Xi Wu, Jiliu Zhou, Yan Wang
[+] More
[-] Less
Semi-supervised deep facial expression recognition (SS-DFER) has recently attracted rising research interest due to its more practical setting of abundant unlabeled data. However, there are two main problems unconsidered in current SS-DFER methods: 1) label ambiguity, i.e., given labels mismatch with facial expressions; 2) inefficient utilization of unlabeled data with low-confidence. In this paper, we propose a novel SS-DFER method, including a Label DIsambiguation module and a PrOgressive Negative Learning module, namely LION, to simultaneously address both problems. Specifically, the label disambiguation module operates on labeled data, including data with accurate labels (clear data) and ambiguous labels (ambiguous data). It first uses clear data to calculate prototypes for all the expression classes, and then re-assign a candidate label set to all the ambiguous data. Based on the prototypes and the candidate label set, the ambiguous data can be relabeled more accurately. As for unlabeled data with low-confidence, the progressive negative learning module is developed to iteratively mine more complete complementary labels, which can guide the model to reduce the association between data and corresponding complementary labels. Experiments on three challenging datasets show that our method significantly outperforms the current state-of-the-art approaches in SS-DFER and surpasses fully-supervised baselines. Code will be available at https://github.com/NUM-7/LION.
List of keywords
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
Computer Vision -> CV: Applications
2045
FedDWA: Personalized Federated Learning with Dynamic Weight Adjustment
JiaHao Liu, Jiang Wu, Jinyu Chen, Miao Hu, Yipeng Zhou, Di Wu
[+] More
[-] Less
Different from conventional federated learning, personalized federated learning (PFL) is able to train a customized model for each individual client according to its unique requirement. The mainstream approach is to adopt a kind of weighted aggregation method to generate personalized models, in which weights are determined by the loss value or model parameters among different clients. However, such kinds of methods require clients to download others’ models. It not only sheer increases communication traffic but also potentially infringes data privacy. In this paper, we propose a new PFL algorithm called FedDWA (Federated Learning with Dynamic Weight Adjustment) to address the above problem, which leverages the parameter server (PS) to compute personalized aggregation weights based on collected models from clients. In this way, FedDWA can capture similarities between clients with much less communication overhead. More specifically, we formulate the PFL problem as an optimization problem by minimizing the distance between personalized models and guidance models, so as to customize aggregation weights for each client. Guidance models are obtained by the local one-step ahead adaptation on individual clients. Finally, we conduct extensive experiments using five real datasets and the results demonstrate that FedDWA can significantly reduce the communication traffic and achieve much higher model accuracy than the state-of-the-art approaches.
List of keywords
Machine Learning -> ML: Federated learning
2048
Video Diffusion Models with Local-Global Context Guidance
Siyuan Yang, Lu Zhang, Yu Liu, Zhizhuo Jiang, You He
[+] More
[-] Less
Diffusion models have emerged as a powerful paradigm in video synthesis tasks including prediction, generation, and interpolation. Due to the limitation of the computational budget, existing methods usually implement conditional diffusion models with an autoregressive inference pipeline, in which the future fragment is predicted based on the distribution of adjacent past frames. However, only the conditions from a few previous frames can’t capture the global temporal coherence, leading to inconsistent or even outrageous results in long-term video prediction. In this paper, we propose a Local-Global Context guided Video Diffusion model (LGC-VD) to capture multi-perception conditions for producing high-quality videos in both conditional/unconditional settings. In LGC-VD, the UNet is implemented with stacked residual blocks with self-attention units, avoiding the undesirable computational cost in 3D Conv. We construct a local-global context guidance strategy to capture the multi-perceptual embedding of the past fragment to boost the consistency of future prediction. Furthermore, we propose a two-stage training strategy to alleviate the effect of noisy frames for more stable predictions. Our experiments demonstrate that the proposed method achieves favorable performance on video prediction, interpolation, and unconditional video generation.
List of keywords
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
Computer Vision -> CV: Video analysis and understanding   
2051
Continuous-Time Graph Learning for Cascade Popularity Prediction
Xiaodong Lu, Shuo Ji, Le Yu, Leilei Sun, Bowen Du, Tongyu Zhu
[+] More
[-] Less
Information propagation on social networks could be modeled as cascades, and many efforts have been made to predict the future popularity of cascades. However, most of the existing research treats a cascade as an individual sequence. Actually, the cascades might be correlated with each other due to the shared users or similar topics. Moreover, the preferences of users and semantics of a cascade are usually continuously evolving over time. In this paper, we propose a continuous-time graph learning method for cascade popularity prediction, which first connects different cascades via a universal sequence of user-cascade and user-user interactions and then chronologically learns on the sequence by maintaining the dynamic states of users and cascades. Specifically, for each interaction, we present an evolution learning module to continuously update the dynamic states of the related users and cascade based on their currently encoded messages and previous dynamic states. We also devise a cascade representation learning component to embed the temporal information and structural information carried by the cascade. Experiments on real-world datasets demonstrate the superiority and rationality of our approach.
List of keywords
Data Mining -> DM: Mining text, web, social media
Data Mining -> DM: Mining spatial and/or temporal data
2072
Learnable Surrogate Gradient for Direct Training Spiking Neural Networks
Shuang Lian, Jiangrong Shen, Qianhui Liu, Ziming Wang, Rui Yan, Huajin Tang
[+] More
[-] Less
Spiking neural networks (SNNs) have increasingly drawn massive research attention due to biological interpretability and efficient computation. Recent achievements are devoted to utilizing the surrogate gradient (SG) method to avoid the dilemma of non-differentiability of spiking activity to directly train SNNs by backpropagation. However, the fixed width of the SG leads to gradient vanishing and mismatch problems, thus limiting the performance of directly trained SNNs. In this work, we propose a novel perspective to unlock the width limitation of SG, called the learnable surrogate gradient (LSG) method. The LSG method modulates the width of SG according to the change of the distribution of the membrane potentials, which is identified to be related to the decay factors based on our theoretical analysis. Then we introduce the trainable decay factors to implement the LSG method, which can optimize the width of SG automatically during training to avoid the gradient vanishing and mismatch problems caused by the limited width of SG. We evaluate the proposed LSG method on both image and neuromorphic datasets. Experimental results show that the LSG method can effectively alleviate the blocking of gradient propagation caused by the limited width of SG when training deep SNNs directly. Meanwhile, the LSG method can help SNNs achieve competitive performance on both latency and accuracy.
List of keywords
Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Brain sciences
Humans and AI -> HAI: Cognitive systems
2077
Fluid Dynamics-Inspired Network for Infrared Small Target Detection
Tianxiang Chen, Qi Chu, Bin Liu, Nenghai Yu
[+] More
[-] Less
Most infrared small target detection (ISTD) networks focus on building effective neural blocks or feature fusion modules but none describes the ISTD process from the image evolution perspective. The directional evolution of image pixels influenced by convolution, pooling and surrounding pixels is analogous to the movement of fluid elements constrained by surrounding variables ang particles. Inspired by this, we explore a novel research routine by abstracting the movement of pixels in the ISTD process as the flow of fluid in fluid dynamics (FD). Specifically, a new Fluid Dynamics-Inspired Network (FDI-Net) is devised for ISTD. Based on Taylor Central Difference (TCD) method, the TCD feature extraction block is designed, where convolution and Transformer structures are combined for local and global information. The pixel motion equation during the ISTD process is derived from the Navier–Stokes (N-S) equation, constructing a N-S Refinement Module that refines extracted features with edge details. Thus, the TCD feature extraction block determines the primary movement direction of pixels during detection, while the N-S Refinement Module corrects some skewed directions of the pixel stream to supplement the edge details. Experiments on IRSTD-1k and SIRST demonstrate that our method achieves SOTA performance in terms of evaluation metrics.
List of keywords
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Recognition (object detection, categorization)
2094
Video Frame Interpolation with Densely Queried Bilateral Correlation
Chang Zhou, Jie Liu, Jie Tang, Gangshan Wu
[+] More
[-] Less
Video Frame Interpolation (VFI) aims to synthesize non-existent intermediate frames between existent frames. Flow-based VFI algorithms estimate intermediate motion fields to warp the existent frames. Real-world motions’ complexity and the reference frame’s absence make motion estimation challenging. Many state-of-the-art approaches explicitly model the correlations between two neighboring frames for more accurate motion estimation. In common approaches, the receptive field of correlation modeling at higher resolution depends on the motion fields estimated beforehand. Such receptive field dependency makes common motion estimation approaches poor at coping with small and fast-moving objects. To better model correlations and to produce more accurate motion fields, we propose the Densely Queried Bilateral Correlation (DQBC) that gets rid of the receptive field dependency problem and thus is more friendly to small and fast-moving objects. The motion fields generated with the help of DQBC are further refined and up-sampled with context features. After the motion fields are fixed, a CNN-based SynthNet synthesizes the final interpolated frame. Experiments show that our approach enjoys higher accuracy and less inference time than the state-of-the-art. Source code is available at https://github.com/kinoud/DQBC.
List of keywords
Computer Vision -> CV: Computational photography
Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Video analysis and understanding   
2099
Robust Reinforcement Learning via Progressive Task Sequence
Yike Li, Yunzhe Tian, Endong Tong, Wenjia Niu, Jiqiang Liu
[+] More
[-] Less
Robust reinforcement learning (RL) has been a challenging problem due to the gap between simulation and the real world. Existing efforts typically address the robust RL problem by solving a max-min problem. The main idea is to maximize the cumulative reward under the worst-possible perturbations. However, the worst-case optimization either leads to overly conservative solutions or unstable training process, which further affects the policy robustness and generalization performance. In this paper, we tackle this problem from both formulation definition and algorithm design. First, we formulate the robust RL as a max-expectation optimization problem, where the goal is to find an optimal policy under both the worst cases and the non-worst cases. Then, we propose a novel framework DRRL to solve the max-expectation optimization. Given our definition of the feasible tasks, a task generation and sequencing mechanism is introduced to dynamically output tasks at appropriate difficulty level for the current policy. With these progressive tasks, DRRL realizes dynamic multi-task learning to improve the policy robustness and the training stability. Finally, extensive experiments demonstrate that the proposed method exhibits significant performance on the unmanned CarRacing game and multiple high-dimensional MuJoCo environments.
List of keywords
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Agent-based and Multi-agent Systems -> MAS: Agent theories and models
2103
Explainable Reinforcement Learning via a Causal World Model
Zhongwei Yu, Jingqing Ruan, xing dengpeng
[+] More
[-] Less
Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.
List of keywords
Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Causality
Machine Learning -> ML: Reinforcement learning
2106
Handling Learnwares Developed from Heterogeneous Feature Spaces without Auxiliary Data
Peng Tan, Zhi-Hao Tan, Yuan Jiang, Zhi-Hua Zhou
[+] More
[-] Less
The learnware paradigm proposed by Zhou [2016] devotes to providing a modelsharing platform where users can solve their problems by reusing existing efforts instead of starting from scratch. A learnware consists of a wellperformed trained model and the specification which enables the model to be adequately identified according to the user’s requirement. The previous studies concentrated on the homogeneous case where models share the same feature space. However, in realworld scenarios, models are usually coming from different feature spaces. If the learnware market can handle heterogeneous feature spaces, all wellperformed models built for a particular task with arbitrary feature spaces can be submitted to the market, and the market can accommodate them well and identify the helpful model for the user even if the feature space is not aligned. This problem would be easier with the help of extra auxiliary data connecting all different feature spaces, but such data can hardly be obtained in reality. In this paper, we provide a general framework for accommodating heterogeneous learnwares without using any auxiliary data. The key idea is to exploit the specifications to construct the relationship between different feature spaces. Furthermore, we give a matrix factorizationbased implementation of the overall procedure for constructing and exploiting the heterogeneous learnware market. Experiments on realworld tasks validate the efficacy of our method.
List of keywords
Machine Learning -> ML: Classification
Machine Learning -> ML: Automated machine learning
Machine Learning -> ML: Multi-task and transfer learning
2118
Hawkes Process Based on Controlled Differential Equations
Minju Jo, Seungji Kook, Noseong Park
[+] More
[-] Less
Hawkes processes are a popular framework to model the occurrence of sequential events, i.e., occurrence dynamics, in several fields such as social diffusion. In real-world scenarios, the inter-arrival time among events is irregular. However, existing neural network-based Hawkes process models not only i) fail to capture such complicated irregular dynamics, but also ii) resort to heuristics to calculate the log-likelihood of events since they are mostly based on neural networks designed for regular discrete inputs. To this end, we present the concept of Hawkes process based on controlled differential equations (HP-CDE), by adopting the neural controlled differential equation (neural CDE) technology which is an analogue to continuous RNNs. Since HP-CDE continuously reads data, i) irregular time-series datasets can be properly treated preserving their uneven temporal spaces, and ii) the log-likelihood can be exactly computed. Moreover, as both Hawkes processes and neural CDEs are first developed to model complicated human behavioral dynamics, neural CDE-based Hawkes processes are successful in modeling such occurrence dynamics. In our experiments with 4 real-world datasets, our method outperforms existing methods by non-trivial margins.
List of keywords
Data Mining -> DM: Mining spatial and/or temporal data
Data Mining -> DM: Mining text, web, social media
2120
Truthful Auctions for Automated Bidding in Online Advertising
Yidan Xing, Zhilin Zhang, Zhenzhe Zheng, Chuan Yu, Jian Xu, Fan Wu, Guihai Chen
[+] More
[-] Less
Automated bidding, an emerging intelligent decision-making paradigm powered by machine learning, has become popular in online advertising. Advertisers in automated bidding evaluate the cumulative utilities and have private financial constraints over multiple ad auctions in a long-term period. Based on these distinct features, we consider a new ad auction model for automated bidding: the values of advertisers are public while the financial constraints, such as budget and return on investment (ROI) rate, are private types. We derive the truthfulness conditions with respect to private constraints for this multi-dimensional setting, and demonstrate any feasible allocation rule could be equivalently reduced to a series of non-decreasing functions on budget. However, the resulted allocation mapped from these non-decreasing functions generally follows an irregular shape, making it difficult to obtain a closed-form expression for the auction objective. To overcome this design difficulty, we propose a family of truthful automated bidding auction with personalized rank scores, similar to the Generalized Second-Price (GSP) auction. The intuition behind our design is to leverage personalized rank scores as the criteria to allocate items, and compute a critical ROI to transforms the constraints on budget to the same dimension as ROI. The experimental results demonstrate that the proposed auction mechanism outperforms the widely used ad auctions, such as first-price auction and second-price auction, in various automated bidding environments.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Mechanism design
2144
Stochastic Feature Averaging for Learning with Long-Tailed Noisy Labels
Hao-Tian Li, Tong Wei, Hao Yang, Kun Hu, Chong Peng, Li-Bo Sun, Xun-Liang Cai, Min-Ling Zhang
[+] More
[-] Less
Deep neural networks have shown promising results on a wide variety of tasks using large-scale and well-annotated training datasets. However, data collected from real-world applications can suffer from two prevalent biases, i.e., long-tailed class distribution and label noise. Previous efforts on long-tailed learning and label-noise learning can only address a single type of data bias, leading to a severe deterioration of their performance. In this paper, we propose a distance-based sample selection algorithm called Stochastic Feature Averaging (SFA), which fits a Gaussian using the exponential running average of class centroids to capture uncertainty in representation space due to label noise and data scarcity. With SFA, we detect noisy samples based on their distances to class centroids sampled from this Gaussian distribution. Based on the identified clean samples, we then propose to train an auxiliary balanced classifier to improve the generalization for the minority class and facilitate the update of Gaussian parameters. Extensive experimental results show that SFA can enhance the performance of existing methods on both simulated and real-world datasets. Further, we propose to combine SFA with the sample-selection approach, distribution-robust, and noise-robust loss functions, resulting in significant improvement in performance over the baselines. Our code is available at https://github.com/HotanLee/SFA
List of keywords
Machine Learning -> ML: Weakly supervised learning
Machine Learning -> ML: Multi-label
Machine Learning -> ML: Semi-supervised learning
2147
On the Complexity of Counterfactual Reasoning
Yunqiu Han, Yizuo Chen, Adnan Darwiche
[+] More
[-] Less
We study the computational complexity of counterfactual reasoning in relation to the complexity of associational and interventional reasoning on structural causal models (SCMs). We show that counterfactual reasoning is no harder than associational or interventional reasoning on fully specified SCMs in the context of two computational frameworks. The first framework is based on the notion of treewidth and includes the classical variable elimination and jointree algorithms. The second framework is based on the more recent and refined notion of causal treewidth which is directed towards models with functional dependencies such as SCMs. Our results are constructive and based on bounding the (causal) treewidth of twin networks—used in standard counterfactual reasoning that contemplates two worlds, real and imaginary—to the (causal) treewidth of the underlying SCM structure. In particular, we show that the latter (causal) treewidth is no more than twice the former plus one. Hence, if associational or interventional reasoning is tractable on a fully specified SCM then counterfactual reasoning is tractable too. We extend our results to general counterfactual reasoning that requires contemplating more than two worlds and discuss applications of our results to counterfactual reasoning with partially specified SCMs that are coupled with data. We finally present empirical results that measure the gap between the complexities of counterfactual reasoning and associational/interventional reasoning on random SCMs.
List of keywords
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
Knowledge Representation and Reasoning -> KRR: Causality
Uncertainty in AI -> UAI: Bayesian networks
2160
Manifold-Aware Self-Training for Unsupervised Domain Adaptation on Regressing 6D Object Pose
Yichen Zhang, Jiehong Lin, Ke Chen, Zelin Xu, Yaowei Wang, Kui Jia
[+] More
[-] Less
Domain gap between synthetic and real data in visual regression (e.g., 6D pose estimation) is bridged in this paper via global feature alignment and local refinement on the coarse classification of discretized anchor classes in target space, which imposes a piece-wise target manifold regularization into domain-invariant representation learning. Specifically, our method incorporates an explicit self-supervised manifold regularization, revealing consistent cumulative target dependency across domains, to a self-training scheme (e.g., the popular Self-Paced Self-Training) to encourage more discriminative transferable representations of regression tasks. Moreover, learning unified implicit neural functions to estimate relative direction and distance of targets to their nearest class bins aims to refine target classification predictions, which can gain robust performance against inconsistent feature scaling sensitive to UDA regressors. Experiment results on three public benchmarks of the challenging 6D pose estimation task can verify the effectiveness of our method, consistently achieving superior performance to the state-of-the-art for UDA on 6D pose estimation. Codes and pre-trained models are available https://github.com/Gorilla-Lab-SCUT/MAST.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
2178
On the Study of Curriculum Learning for Inferring Dispatching Policies on the Job Shop Scheduling
Zangir Iklassov, Dmitrii Medvedev, Ruben Solozabal Ochoa de Retana, Martin Takac
[+] More
[-] Less
This paper studies the use of Curriculum Learning on Reinforcement Learning (RL) to improve the performance of the dispatching policies learned on the Job-shop Scheduling Problem (JSP). Current works in the literature present a large optimality gap when learning end-to-end solutions on this problem. In this regard, we identify the difficulty for RL to learn directly on large instances as part of the issue and use Curriculum Learning (CL) to mitigate this effect. Particularly, CL sequences the learning process in a curriculum of increasing complexity tasks, which allows learning on large instances that otherwise would be impossible to learn from scratch. In this paper, we present a size-agnostic model that enables us to demonstrate that current curriculum strategies have a major impact on the quality of the solution inferred. In addition, we introduce a novel Reinforced Adaptive Staircase Curriculum Learning (RASCL) strategy, which adjusts the difficulty level during the learning process by revisiting the worst-performing instances. Conducted experiments on Taillard’s and Demirkol’s datasets show that the presented approach significantly improves the current stateof-the-art models on the JSP. It reduces the average optimality gap from 19.35% to 10.46% on Taillard’s instances and from 38.43% to 18.85% on Demirkol’s instances.
List of keywords
Planning and Scheduling -> PS: Learning in planning and scheduling
Planning and Scheduling -> PS: Scheduling
2184
Exploring Safety Supervision for Continual Test-time Domain Adaptation
Xu Yang, Yanan Gu, Kun Wei, Cheng Deng
[+] More
[-] Less
Continual test-time domain adaptation aims to adapt a source pre-trained model to a continually changing target domain without using any source data. Unfortunately, existing methods based on pseudo-label learning suffer from the changing target domain environment, and the quality of generated pseudo-labels is attenuated due to the domain shift, leading to instantaneous negative learning and long-term knowledge forgetting. To solve these problems, in this paper, we propose a simple yet effective framework for exploring safety supervision with three elaborate strategies: Label Safety, Sample Safety, and Parameter Safety. Firstly, to select reliable pseudo-labels, we define and adjust the confidence threshold in a self-adaptive manner according to the test-time learning status. Secondly, a soft-weighted contrastive learning module is presented to explore the highly-correlated samples and discriminate uncorrelated ones, improving the instantaneous efficiency of the model. Finally, we frame a Soft Weight Alignment strategy to normalize the distance between the parameters of the adapted model and the source pre-trained model, which alleviates the long-term problem of knowledge forgetting and significantly improves the accuracy of the adapted model in the late adaptation stage. Extensive experimental results demonstrate that our method achieves state-of-the-art performance on several benchmark datasets.
List of keywords
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
Computer Vision -> CV: Representation learning
2193
Structural Hawkes Processes for Learning Causal Structure from Discrete-Time Event Sequences
Jie Qiao, Ruichu Cai, Siyu Wu, Yu Xiang, Keli Zhang, Zhifeng Hao
[+] More
[-] Less
Learning causal structure among event types from discrete-time event sequences is a particularly important but challenging task. Existing methods, such as the multivariate Hawkes processes based methods, mostly boil down to learning the so-called Granger causality which assumes that the cause event happens strictly prior to its effect event. Such an assumption is often untenable beyond applications, especially when dealing with discrete-time event sequences in low-resolution; and typical discrete Hawkes processes mainly suffer from identifiability issues raised by the instantaneous effect, i.e., the causal relationship that occurred simultaneously due to the low-resolution data will not be captured by Granger causality. In this work, we propose Structure Hawkes Processes (SHPs) that leverage the instantaneous effect for learning the causal structure among events type in discrete-time event sequence. The proposed method is featured with the Expectation-Maximization of the likelihood function and a sparse optimization scheme. Theoretical results show that the instantaneous effect is a blessing rather than a curse, and the causal structure is identifiable under the existence of the instantaneous effect. Experiments on synthetic and real-world data verify the effectiveness of the proposed method.
List of keywords
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
2225
Privacy-Preserving End-to-End Spoken Language Understanding
Yinggui Wang, Wei Huang, Le Yang
[+] More
[-] Less
Spoken language understanding (SLU), one of the key enabling technologies for human-computer interaction in IoT devices, provides an easy-to-use user interface. Human speech can contain a lot of user-sensitive information, such as gender, identity, and sensitive content. New types of security and privacy breaches have thus emerged. Users do not want to expose their personal sensitive information to malicious attacks by untrusted third parties. Thus, the SLU system needs to ensure that a potential malicious attacker cannot deduce the sensitive attributes of the users, while it should avoid greatly compromising the SLU accuracy. To address the above challenge, this paper proposes a novel SLU multi-task privacy-preserving model to prevent both the speech recognition (ASR) and identity recognition (IR) attacks. The model uses the hidden layer separation technique so that SLU information is distributed only in a specific portion of the hidden layer, and the other two types of information are removed to obtain a privacy-secure hidden layer. In order to achieve good balance between efficiency and privacy, we introduce a new mechanism of model pre-training, namely joint adversarial training, to further enhance the user privacy. Experiments over two SLU datasets show that the proposed method can reduce the accuracy of both the ASR and IR attacks close to that of a random guess, while leaving the SLU performance largely unaffected.
List of keywords
Natural Language Processing -> NLP: Dialogue and interactive systems
Machine Learning -> ML: Adversarial machine learning
Natural Language Processing -> NLP: Speech
2228
CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation
Tao Lei, Rui Sun, Xuan Wang, Yingbo Wang, Xi He, Asoke Nandi
[+] More
[-] Less
The hybrid architecture of convolutional neural networks (CNNs) and Transformer are very popular for medical image segmentation. However, it suffers from two challenges. First, although a CNNs branch can capture the local image features using vanilla convolution, it cannot achieve adaptive feature learning. Second, although a Transformer branch can capture the global features, it ignores the channel and cross-dimensional self-attention, resulting in a low segmentation accuracy on complex-content images. To address these challenges, we propose a novel hybrid architecture of convolutional neural networks hand in hand with vision Transformers (CiT-Net) for medical image segmentation. Our network has two advantages. First, we design a dynamic deformable convolution and apply it to the CNNs branch, which overcomes the weak feature extraction ability due to fixed-size convolution kernels and the stiff design of sharing kernel parameters among different inputs. Second, we design a shifted-window adaptive complementary attention module and a compact convolutional projection. We apply them to the Transformer branch to learn the cross-dimensional long-term dependency for medical images. Experimental results show that our CiT-Net provides better medical image segmentation results than popular SOTA methods. Besides, our CiT-Net requires lower parameters and less computational costs and does not rely on pre-training. The code is publicly available at https://github.com/SR0920/CiT-Net.
List of keywords
Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Machine learning for vision
2229
On Lower Bounds for Maximin Share Guarantees
Halvard Hummel
[+] More
[-] Less
We study the problem of fairly allocating a set of indivisible items to a set of agents with additive valuations. Recently, Feige et al. (WINE’21) proved that a maximin share (MMS) allocation exists for all instances with n agents and no more than n + 5 items. Moreover, they proved that an MMS allocation is not guaranteed to exist for instances with 3 agents and at least 9 items, or n ≥ 4 agents and at least 3n + 3 items. In this work, we shrink the gap between these upper and lower bounds for guaranteed existence of MMS allocations. We prove that for any integer c > 0, there exists a number of agents n_c such that an MMS allocation exists for any instance with n ≥ n_c agents and at most n + c items, where n_c ≤ ⌊0.6597^c · c!⌋ for allocation of goods and n_c ≤ ⌊0.7838^c · c!⌋ for chores. Furthermore, we show that for n ≠ 3 agents, all instances with n + 6 goods have an MMS allocation.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Fair division
2230
Optimal Decision Trees For Interpretable Clustering with Constraints
Pouya Shati, Eldan Cohen, Sheila McIlraith
[+] More
[-] Less
Constrained clustering is a semi-supervised task that employs a limited amount of labelled data, formulated as constraints, to incorporate domain-specific knowledge and to significantly improve clustering accuracy. Previous work has considered exact optimization formulations that can guarantee optimal clustering while satisfying all constraints, however these approaches lack interpretability. Recently, decision-trees have been used to produce inherently interpretable clustering solutions, however existing approaches do not support clustering constraints and do not provide strong theoretical guarantees on solution quality. In this work, we present a novel SAT-based framework for interpretable clustering that supports clustering constraints and that also provides strong theoretical guarantees on solution quality. We also present new insight into the trade-off between interpretability and satisfaction of such user-provided constraints. Our framework is the first approach for interpretable and constrained clustering. Experiments with a range of real-world and synthetic datasets demonstrate that our approach can produce high-quality and interpretable constrained clustering solutions.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Machine Learning -> ML: Clustering
2241
Uncovering the Largest Community in Social Networks at Scale
Shohei Matsugu, Yasuhiro Fujiwara, Hiroaki Shiokawa
[+] More
[-] Less
The Maximum k-Plex Search (MPS) can find the largest k-plex, which is a generalization of the largest clique. Although MPS is commonly used in AI to effectively discover real-world communities of social networks, existing MPS algorithms suffer from high computational costs because they iteratively scan numerous nodes to find the largest k-plex. Here, we present an efficient MPS algorithm called Branch-and-Merge (BnM), which outputs an exact maximum k-plex. BnM merges unnecessary nodes to explore a smaller graph than the original one. Extensive evaluations on real-world social networks demonstrate that BnM significantly outperforms other state-of-the-art MPS algorithms in terms of running time.
List of keywords
Data Mining -> DM: Mining text, web, social media
Data Mining -> DM: Applications
2250
DeLELSTM: Decomposition-based Linear Explainable LSTM to Capture Instantaneous and Long-term Effects in Time Series
Chaoqun Wang, Yijun Li, Xiangqian Sun, Qi Wu, Dongdong Wang, Zhixiang Huang
[+] More
[-] Less
Time series forecasting is prevalent in various real-world applications. Despite the promising results of deep learning models in time series forecasting, especially the Recurrent Neural Networks (RNNs), the explanations of time series models, which are critical in high-stakes applications, have received little attention. In this paper, we propose a Decomposition-based Linear Explainable LSTM (DeLELSTM) to improve the interpretability of LSTM. Conventionally, the interpretability of RNNs only concentrates on the variable importance and time importance. We additionally distinguish between the instantaneous influence of new coming data and the long-term effects of historical data. Specifically, DeLELSTM consists of two components, i.e., standard LSTM and tensorized LSTM. The tensorized LSTM assigns each variable with a unique hidden state making up a matrix $(h_t)$, and the standard LSTM models all the variables with a shared hidden state (H_t). By decomposing the $(H_t)$ into the linear combination of past information (h_{t-1}) and the fresh information (h_{t}-h_{t-1}), we can get the instantaneous influence and the long-term effect of each feature. In addition, the advantage of linear regression also makes the explanation transparent and clear. We demonstrate the effectiveness and interpretability of DeLELSTM on three empirical datasets. Extensive experiments show that the proposed method achieves competitive performance against the baseline methods and provides a reliable explanation relative to domain knowledge.
List of keywords
Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Time series and data streams
2261
Contour-based Interactive Segmentation
Polina Popenova, Danil Galeev, Anna Vorontsova, Anton Konushin
[+] More
[-] Less
Recent advances in interactive segmentation (IS) allow speeding up and simplifying image editing and labeling greatly. The majority of modern IS approaches accept user input in the form of clicks. However, using clicks may require too many user interactions, especially when selecting small ob- jects, minor parts of an object, or a group of ob- jects of the same type. In this paper, we consider such a natural form of user interaction as a loose contour, and introduce a contour-based IS method. We evaluate the proposed method on the standard segmentation benchmarks, our novel UserContours dataset, and its subset UserContours-G containing difficult segmentation cases. Through experiments, we demonstrate that a single contour provides the same accuracy as multiple clicks, thus reducing the required amount of user interactions.
List of keywords
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Machine learning for vision
2271
VecoCare: Visit Sequences-Clinical Notes Joint Learning for Diagnosis Prediction in Healthcare Data
Yongxin Xu, Kai Yang, Chaohe Zhang, Peinie Zou, Zhiyuan Wang, Hongxin Ding, Junfeng Zhao, Yasha Wang, Bing Xie
[+] More
[-] Less
Due to the insufficiency of electronic health records (EHR) data utilized in practical diagnosis prediction scenarios, most works are devoted to learning powerful patient representations either from structured EHR data (e.g., temporal medical events, lab test results, etc.) or unstructured data (e.g., clinical notes, etc.). However, synthesizing rich information from both of them still needs to be explored. Firstly, the heterogeneous semantic biases across them heavily hinder the synthesis of representation spaces, which is critical for diagnosis prediction. Secondly, the intermingled quality of partial clinical notes leads to inadequate representations of to-be-predicted patients. Thirdly, typical attention mechanisms mainly focus on aggregating information from similar patients, ignoring important auxiliary information from others. To tackle these challenges, we propose a novel visit sequences-clinical notes joint learning approach, dubbed VecoCare. It performs a Gromov-Wasserstein Distance (GWD)-based contrastive learning task and an adaptive masked language model task in a sequential pre-training manner to reduce heterogeneous semantic biases. After pre-training, VecoCare further aggregates information from both similar and dissimilar patients through a dual-channel retrieval mechanism. We conduct diagnosis prediction experiments on two real-world datasets, which indicates that VecoCare outperforms state-of-the-art approaches. Moreover, the findings discovered by VecoCare are consistent with the medical researches.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Health and medicine
Machine Learning -> ML: Representation learning
2272
Adaptive Estimation Q-learning with Uncertainty and Familiarity
Xiaoyu Gong, Shuai Lü, Jiayu Yu, Sheng Zhu, Zongze Li
[+] More
[-] Less
One of the key problems in model-free deep reinforcement learning is how to obtain more accurate value estimations. Current most widely-used off-policy algorithms suffer from over- or underestimation bias which may lead to unstable policy. In this paper, we propose a novel method, Adaptive Estimation Q-learning (AEQ), which uses uncertainty and familiarity to control the value estimation naturally and can adaptively change for specific state-action pair. We theoretically prove the property of our familiarity term which can even keep the expected estimation bias approximate to 0, and experimentally demonstrate our dynamic estimation can improve the performance and prevent the bias continuously increasing. We evaluate AEQ on several continuous control tasks, outperforming state-of-the-art performance. Moreover, AEQ is simple to implement and can be applied in any off-policy actor-critic algorithm.
List of keywords
Machine Learning -> ML: Deep reinforcement learning
Machine Learning -> ML: Ensemble methods
2276
Commonsense Knowledge Enhanced Sentiment Dependency Graph for Sarcasm Detection
Zhe Yu, Di Jin, Xiaobao Wang, Yawen Li, Longbiao Wang, Jianwu Dang
[+] More
[-] Less
Sarcasm is widely utilized on social media platforms such as Twitter and Reddit. Sarcasm detection is required for analyzing people’s true feelings since sarcasm is commonly used to portray a reversed emotion opposing the literal meaning. The syntactic structure is the key to make better use of commonsense when detecting sarcasm. However, it is extremely challenging to effectively and explicitly explore the information implied in syntactic structure and commonsense simultaneously. In this paper, we apply the pre-trained COMET model to generate relevant commonsense knowledge, and explore a novel scenario of constructing a commonsense-augmented sentiment graph and a commonsense-replaced dependency graph for each text. Based on this, a Commonsense Sentiment Dependency Graph Convolutional Network (CSDGCN) framework is proposed to explicitly depict the role of external commonsense and inconsistent expressions over the context for sarcasm detection by interactively modeling the sentiment and dependency information. Experimental results on several benchmark datasets reveal that our proposed method beats the state-of-the-art methods in sarcasm detection, and has a stronger interpretability.
List of keywords
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Knowledge-aided learning
Natural Language Processing -> NLP: Text classification
2296
Complete Instances Mining for Weakly Supervised Instance Segmentation
Zecheng Li
[+] More
[-] Less
Weakly supervised instance segmentation (WSIS) using only image-level labels is a challenging task due to the difficulty of aligning coarse annotations with the finer task. However, with the advancement of deep neural networks (DNNs), WSIS has garnered significant attention. Following a proposal-based paradigm, we encounter a redundant segmentation problem resulting from a single instance being represented by multiple proposals. To address this problem, we propose a novel approach for WSIS that focuses on the online refinement of complete instances through the use of a MaskIoU head to predict the quality of proposals and a Complete Instances Mining (CIM) strategy to explicitly model the redundant segmentation problem and generate refined pseudo labels. Our approach allows the network to become aware of multiple instances and complete instances, and we further improve its robustness through the incorporation of an Anti-noise strategy. Empirical evaluations on the PASCAL VOC 2012 and MS COCO datasets demonstrate that our method achieves state-of-the-art performance with a notable margin.
List of keywords
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
Computer Vision -> CV: Segmentation
Machine Learning -> ML: Weakly supervised learning
2309
Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables
Huawen Shen, Xiang Gao, Jin Wei, Liang Qiao, Yu Zhou, Qiang Li, Zhanzhan Cheng
[+] More
[-] Less
Recent advanced Table Structure Recognition (TSR) models adopt image-to-text solutions to parse table structure. These methods can be formulated as image caption problem, i.e., input a single-table image and output table structure description in a specific text format, e.g., HTML. With the impressive success of Transformer in text generation tasks, these methods use Transformer architecture to predict HTML table text in an autoregressive manner. However, tables always emerge with a large variety of shapes and sizes. Autoregressive models usually suffer from the error accumulation problem as the length of predicted text increases, which results in unsatisfactory performance for large tables. In this paper, we propose a novel image-to-text based TSR method that relieves error accumulation problems and improves performance noticeably. At the core of our method is a cascaded two-step decoder architecture with the former decoder predicting HTML table row tags non-autoregressively and the latter predicting HTML table cell tags of each row in a semi-autoregressive manner. Compared with existing methods that predict HTML text autoregressively, the superiority of our row-to-cell progressive table parsing is twofold: (1) it generates an HTML tag sequence with a vertical-and-horizontal two-step `scanning’, which better fits the inherent 2D structure of image data, (2) it performs substantially better for large tables (long sequence prediction) since it alleviates error accumulation problem specific to autoregressive models. Extensive experiments demonstrate that our method achieves competitive performance on three public benchmarks.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Applications
2321
Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network for Spatial-Temporal Action Localization
Jun Yu, Yingshuai Zheng, Shulan Ruan, Qi Liu, Zhiyuan Cheng, Jinze Wu
[+] More
[-] Less
The key to video action detection lies in the understanding of interaction between persons and background objects in a video. Current methods usually employ object detectors to extract objects directly or use grid features to represent objects in the environment, which underestimate the great potential of multi-scale context information (e.g., objects and scenes of different sizes). How to exactly represent the multi-scale context and make full utilization of it still remains an unresolved challenge for spatial-temporal action localization. In this paper, we propose a novel Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network (AMCRNet) that extracts multi-scale context through multiple pooling layers with different sizes. Specifically, we develop an Interactive Relation Extraction module to model the higher-order relation between the target person and the context (e.g., other persons and objects). Along this line, we further propose a History Feature Bank and Interaction method to achieve better performance by modeling such relation across continuing video clips. Extensive experimental results on AVA2.2 and UCF101-24 demonstrate the superiority and rationality of our proposed AMCRNet.
List of keywords
Computer Vision -> CV: Action and behavior recognition
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Video analysis and understanding   
2338
CSGCL: Community-Strength-Enhanced Graph Contrastive Learning
Han Chen, Ziwen Zhao, Yuhua Li, Yixiong Zou, Ruixuan Li, Rui Zhang
[+] More
[-] Less
Graph Contrastive Learning (GCL) is an effective way to learn generalized graph representations in a self-supervised manner, and has grown rapidly in recent years. However, the underlying community semantics has not been well explored by most previous GCL methods. Research that attempts to leverage communities in GCL regards them as having the same influence on the graph, leading to extra representation errors. To tackle this issue, we define ”community strength” to measure the difference of influence among communities. Under this premise, we propose a Community-Strength-enhanced Graph Contrastive Learning (CSGCL) framework to preserve community strength throughout the learning process. Firstly, we present two novel graph augmentation methods, Communal Attribute Voting (CAV) and Communal Edge Dropping (CED), where the perturbations of node attributes and edges are guided by community strength. Secondly, we propose a dynamic ”Team-up” contrastive learning scheme, where community strength is used to progressively fine-tune the contrastive objective. We report extensive experiment results on three downstream tasks: node classification, node clustering, and link prediction. CSGCL achieves state-of-the-art performance compared with other GCL methods, validating that community strength brings effectiveness and generality to graph representations. Our code is available at https://github.com/HanChen-HUST/CSGCL.
List of keywords
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Self-supervised Learning
2339
Generalized Discriminative Deep Non-Negative Matrix Factorization Based on Latent Feature and Basis Learning
Zijian Yang, Zhiwei Li, Lu Sun
[+] More
[-] Less
As a powerful tool for data representation, deep NMF has attracted a lot of attention in recent years. Current deep NMF builds the multi-layer structure by decomposing either basis matrix or feature matrix into multiple factors, and probably complicate the learning process when data is insufficient or exhibits simple representation. To overcome the limitations, a novel algorithm called Generalized Deep Non-negative Matrix Factorization (GDNMF) is proposed, which generalizes several NMF and deep NMF methods in a unified framework. GDNMF simultaneously performs decomposition on both features and bases, which learns a hierarchical data representation based on multi-level basis. To further improve the latent representation and enhance its flexibility, GDNMF mutually reinforces the shallow linear model and the deep non-linear model. Moreover, to utilize the limited prior information, semi-supervised GDNMF is proposed by treating partial label information as soft constraints in the multi-layer structure. An efficient two-phase algorithm is developed to optimize the proposed problem. Experiment results on five real-world datesets verify the superior performance of GDNMF compared with state-of-the-art NMF-based methods.
List of keywords
Machine Learning -> ML: Clustering
Machine Learning -> ML: Weakly supervised learning
2358
Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition
Boqiang Zhang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Yongdong Zhang
[+] More
[-] Less
Vision model have gained increasing attention due to their simplicity and efficiency in Scene Text Recognition (STR) task. However, due to lacking the perception of linguistic knowledge and information, recent vision models suffer from two problems: (1) the pure vision-based query results in attention drift, which usually causes poor recognition and is summarized as linguistic insensitive drift (LID) problem in this paper. (2) the visual feature is suboptimal for the recognition in some vision-missing cases (e.g. occlusion, etc.). To address these issues, we propose a Linguistic Perception Vision model (LPV), which explores the linguistic capability of vision model for accurate text recognition. To alleviate the LID problem, we introduce a Cascade Position Attention (CPA) mechanism that obtains high-quality and accurate attention maps through step-wise optimization and linguistic information mining. Furthermore, a Global Linguistic Reconstruction Module (GLRM) is proposed to improve the representation of visual features by perceiving the linguistic information in the visual space, which gradually converts visual features into semantically rich ones during the cascade process. Different from previous methods, our method obtains SOTA results while keeping low complexity (92.4% accuracy with only 8.11M parameters). Code is available at https://github.com/CyrilSterling/LPV.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Scene analysis and understanding   
Computer Vision -> CV: Vision and language 
2362
G2Pxy: Generative Open-Set node Classification on Graphs with Proxy Unknowns
QIN ZHANG, Ze Lin Shi, Xiaolin Zhang, Xiaojun Chen, Philippe Fournier-Viger, Shirui Pan
[+] More
[-] Less
Node classification is the task of predicting the labels of unlabeled nodes in a graph. State-of-the-art methods based on graph neural networks achieve excellent performance when all labels are available during training. But in real-life, models are often applied on data with new classes, which can lead to massive misclassification and thus significantly degrade performance. Hence, developing open-set classification methods is crucial to determine if a given sample belongs to a known class. Existing methods for open-set node classification generally use transductive learning with part or all of the features of real unseen class nodes to help with open-set classification. In this paper, we propose a novel generative open-set node classification method, i.e., G2Pxy, which follows a stricter inductive learning setting where no information about unknown classes is available during training and validation. Two kinds of proxy unknown nodes, inter-class unknown proxies and external unknown proxies are generated via mixup to efficiently anticipate the distribution of novel classes. Using the generated proxies, a closed-set classifier can be transformed into an open-set one, by augmenting it with an extra proxy classifier. Under the constraint of both cross entropy loss and complement entropy loss, G2Pxy achieves superior effectiveness for unknown class detection and known class classification, which is validated by experiments on benchmark graph datasets. Moreover, G2P xy does not have specific requirement on the GNN architecture and shows good generalizations.
List of keywords
Machine Learning -> ML: Classification
Data Mining -> DM: Mining graphs
2366
Enabling Abductive Learning to Exploit Knowledge Graph
Yu-Xuan Huang, Zequn Sun, Guangyao Li, Xiaobin Tian, Wang-Zhou Dai, Wei Hu, Yuan Jiang, Zhi-Hua Zhou
[+] More
[-] Less
Most systems integrating data-driven machine learning with knowledge-driven reasoning usually rely on a specifically designed knowledge base to enable efficient symbolic inference. However, it could be cumbersome for the nonexpert end-users to prepare such a knowledge base in real tasks. Recent years have witnessed the success of large-scale knowledge graphs, which could be ideal domain knowledge resources for real-world machine learning tasks. However, these large-scale knowledge graphs usually contain much information that is irrelevant to a specific learning task. Moreover, they often contain a certain degree of noise. Existing methods can hardly make use of them because the large-scale probabilistic logical inference is usually intractable. To address these problems, we present ABductive Learning with Knowledge Graph (ABL-KG) that can automatically mine logic rules from knowledge graphs during learning, using a knowledge forgetting mechanism for filtering out irrelevant information. Meanwhile, these rules can form a logic program that enables efficient joint optimization of the machine learning model and logic inference within the Abductive Learning (ABL) framework. Experiments on four different tasks show that ABL-KG can automatically extract useful rules from large-scale and noisy knowledge graphs, and significantly improve the performance of machine learning with only a handful of labeled data.
List of keywords
Machine Learning -> ML: Knowledge-aided learning
Knowledge Representation and Reasoning -> KRR: Diagnosis and abductive reasoning
Machine Learning -> ML: Weakly supervised learning
2409
DenseDINO: Boosting Dense Self-Supervised Learning with Token-Based Point-Level Consistency
Yike Yuan, Xinghe Fu, Yunlong Yu, Xi Li
[+] More
[-] Less
In this paper, we propose a simple yet effective transformer framework for self-supervised learning called DenseDINO to learn dense visual representations. To exploit the spatial information that the dense prediction tasks require but neglected by the existing self-supervised transformers, we introduce point-level supervision across views in a novel token-based way. Specifically, DenseDINO introduces some extra input tokens called reference tokens to match the point-level features with the position prior. With the reference token, the model could maintain spatial consistency and deal with multi-object complex scene images, thus generalizing better on dense prediction tasks. Compared with the vanilla DINO, our approach obtains competitive performance when evaluated on classification in ImageNet and achieves a large margin (+7.2\% mIoU) improvement in semantic segmentation on PascalVOC under the linear evaluation protocol.
List of keywords
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
Computer Vision -> CV: Representation learning
2446
Active Visual Exploration Based on Attention-Map Entropy
Adam Pardyl, Grzegorz Rypeść, Grzegorz Kurzejamski, Bartosz Zieliński, Tomasz Trzcinski
[+] More
[-] Less
Active visual exploration addresses the issue of limited sensor capabilities in real-world scenarios, where successive observations are actively chosen based on the environment. To tackle this problem, we introduce a new technique called Attention-Map Entropy (AME). It leverages the internal uncertainty of the transformer-based model to determine the most informative observations. In contrast to existing solutions, it does not require additional loss components, which simplifies the training. Through experiments, which also mimic retina-like sensors, we show that such simplified training significantly improves the performance of reconstruction and classification on publicly available datasets.
List of keywords
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Attention models
Robotics -> ROB: Robotics and vision
2451
Deep Hierarchical Communication Graph in Multi-Agent Reinforcement Learning
Zeyang Liu, Lipeng Wan, Xue Sui, Zhuoran Chen, Kewu Sun, Xuguang Lan
[+] More
[-] Less
Sharing intentions is crucial for efficient cooperation in communication-enabled multi-agent reinforcement learning. Recent work applies static or undirected graphs to determine the order of interaction. However, the static graph is not general for complex cooperative tasks, and the parallel message-passing update in the undirected graph with cycles cannot guarantee convergence. To solve this problem, we propose Deep Hierarchical Communication Graph (DHCG) to learn the dependency relationships between agents based on their messages. The relationships are formulated as directed acyclic graphs (DAGs), where the selection of the proper topology is viewed as an action and trained in an end-to-end fashion. To eliminate the cycles in the graph, we apply an acyclicity constraint as intrinsic rewards and then project the graph in the admissible solution set of DAGs. As a result, DHCG removes redundant communication edges for cost improvement and guarantees convergence. To show the effectiveness of the learned graphs, we propose policy-based and value-based DHCG. Policy-based DHCG factorizes the joint policy in an auto-regressive manner, and value-based DHCG factorizes the joint value function to individual value functions and pairwise payoff functions. Empirical results show that our method improves performance across various cooperative multi-agent tasks, including Predator-Prey, Multi-Agent Coordination Challenge, and StarCraft Multi-Agent Challenge.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Agent-based and Multi-agent Systems -> MAS: Agent communication
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
2459
Scalable Coupling of Deep Learning with Logical Reasoning
Marianne Defresne, Sophie Barbe, Thomas Schiex
[+] More
[-] Less
In the ongoing quest for hybridizing discrete reasoning with neural nets, there is an increasing interest in neural architectures that can learn how to solve discrete reasoning or optimization problems from natural inputs. In this paper, we introduce a scalable neural architecture and loss function dedicated to learning the constraints and criteria of NP-hard reasoning problems expressed as discrete Graphical Models. We empirically show our loss function is able to efficiently learn how to solve NP-hard reasoning problems from natural inputs as the symbolic, visual or many-solutions Sudoku problems as well as the energy optimization formulation of the protein design problem, providing data efficiency, interpretability, and a posteriori control over predictions.
List of keywords
Machine Learning -> ML: Neuro-symbolic methods
Constraint Satisfaction and Optimization -> CSO: Constraint learning and acquisition
2466
CONGREGATE: Contrastive Graph Clustering in Curvature Spaces
Li Sun, Feiyang Wang, Junda Ye, Hao Peng, Philip S. Yu
[+] More
[-] Less
Graph clustering is a longstanding research topic, and has achieved remarkable success with the deep learning methods in recent years. Nevertheless, we observe that several important issues largely remain open. On the one hand, graph clustering from the geometric perspective is appealing but has rarely been touched before, as it lacks a promising space for geometric clustering. On the other hand, contrastive learning boosts the deep graph clustering but usually struggles in either graph augmentation or hard sample mining. To bridge this gap, we rethink the problem of graph clustering from geometric perspective and, to the best of our knowledge, make the first attempt to introduce a heterogeneous curvature space to graph clustering problem. Correspondingly, we present a novel end-to-end contrastive graph clustering model named CONGREGATE, addressing geometric graph clustering with Ricci curvatures. To support geometric clustering, we construct a theoretically grounded Heterogeneous Curvature Space where deep representations are generated via the product of the proposed fully Riemannian graph convolutional nets. Thereafter, we train the graph clusters by an augmentation-free reweighted contrastive approach where we pay more attention to both hard negatives and hard positives in our curvature space. Empirical results on real-world graphs show that our model outperforms the state-of-the-art competitors.
List of keywords
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Clustering
2470
LGI-GT: Graph Transformers with Local and Global Operators Interleaving
Shuo Yin, Guoqiang Zhong
[+] More
[-] Less
Since Transformers can alleviate some critical and fundamental problems of graph neural networks (GNNs), such as over-smoothing, over-squashing and limited expressiveness, they have been successfully applied to graph representation learning and achieved impressive results. However, although there are many works dedicated to make graph Transformers (GTs) aware of the structure and edge information by specifically tailored attention forms or graph-related positional and structural encodings, few works address the problem of how to construct high-performing GTs with modules of GNNs and Transformers. In this paper, we propose a novel graph Transformer with local and global operators interleaving (LGI-GT), in which we further design a new method propagating embeddings of the [CLS] token for global information representation. Additionally, we propose an effective message passing module called edge enhanced local attention (EELA), which makes LGI-GT a full-attention GT. Extensive experiments demonstrate that LGI-GT performs consistently better than previous state-of-the-art GNNs and GTs, while ablation studies show the effectiveness of the proposed LGI scheme and EELA. The source code of LGI-GT is available at https://github.com/lgi-gt/LGI-GT.
List of keywords
Machine Learning -> ML: Sequence and graph learning
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Representation learning
2471
Reverse Engineering of Temporal Queries Mediated by LTL Ontologies
Marie Fortin, Boris Konev, Vladislav Ryzhikov, Yury Savateev, Frank Wolter, Michael Zakharyaschev
[+] More
[-] Less
In reverse engineering of database queries, we aim to construct a query from a given set of answers and non-answers; it can then be used to explore the data further or as an explanation of the answers and non-answers. We investigate this query-by-example problem for queries formulated in positive fragments of linear temporal logic LTL over timestamped data, focusing on the design of suitable query languages and the combined and data complexity of deciding whether there exists a query in the given language that separates the given answers from non-answers. We consider both plain LTL queries and those mediated by LTL ontologies.
List of keywords
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
2479
U-Match: Two-view Correspondence Learning with Hierarchy-aware Local Context Aggregation
Zizhuo Li, Shihua Zhang, Jiayi Ma
[+] More
[-] Less
Local context capturing has become the core factor for achieving leading performance in two-view correspondence learning. Recent advances have devised various local context extractors whereas typically adopting explicit neighborhood relation modeling that is restricted and inflexible. To address this issue, we introduce U-Match, an attentional graph neural network that has the flexibility to enable implicit local context awareness at multiple levels. Specifically, a hierarchy-aware graph representation (HAGR) module is designed and fleshed out by local context pooling and unpooling operations. The former encodes local context by adaptively sampling a set of nodes to form a coarse-grained graph, while the latter decodes local context by recovering the coarsened graph back to its original size. Moreover, an orthogonal fusion module is proposed for the collaborative use of HAGR module, which integrates complementary local and global information into compact feature representations without redundancy. Extensive experiments on different visual tasks prove that our method significantly surpasses the state-of-the-arts. In particular, U-Match attains an AUC at 5 degree threshold of 60.53% on the challenging YFCC100M dataset without RANSAC, outperforming the strongest prior model by 8.61 absolute percentage points. Our code is publicly available at https://github.com/ZizhuoLi/U-Match.
List of keywords
Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Image and video retrieval 
2490
An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
Mingliang Zhai, yulin li, xiameng Qin, Chen Yi, Qunyi Xie, Chengquan Zhang, Kun Yao, Yuwei WU, Yunde Jia
[+] More
[-] Less
Transformers achieve promising performance in the structure document understanding because of its complex calculation, but remain inefficient in time complexity. Existing lightweight transformers fail to represent different granularity in documents. Therefore, it is difficult for them to achieve a good trade-off between efficiency and performance. In this paper, we present an hourglass architecture for high-performance low-computation document understanding. Specifically, we design a modality-guided dynamic token merging block, which not only makes the model learn multi-granularity representation, but also reduces the number of tokens in the middle layer. Considering that multi-modal interaction is critical for guiding merge, we develop a symmetry cross attention (SCA) to efficiently interact with multi-modal information. SCA allows one modality input as query to calculate cross attention with another modality. Extensive experiments on FUNSD, SROIE, and CORD datasets demonstrate that our model achieves state-of-the-art performance and 1.9x faster inference time than the state-of-the-art methods.
List of keywords
Natural Language Processing -> NLP: Information extraction
Machine Learning -> ML: Multi-modal learning
2491
CVTP3D: Cross-view Trajectory Prediction Using Shared 3D Queries for Autonomous Driving
Zijian Song, Huikun Bi, Ruisi Zhang, Tianlu Mao, Zhaoqi Wang
[+] More
[-] Less
Trajectory prediction with uncertainty is a critical and challenging task for autonomous driving. Nowadays, we can easily access sensor data represented in multiple views. But the existing models did not take cross-view consistency as the main condition, i.e., there are divergences between the multimodal predictions from different views. It is not practical and effective when the network does not comprehend the 3D scene, which could cause the downstream module in a dilemma. Our work modeled multimodal in a practical and reasonable way by maintaining cross-view consistency. We presented a cross-view trajectory prediction method using shared 3D Queries (CVTP3D). We employ a set of 3D queries shared across views to generate multi-goals that are cross-view consistent. We also proposed a random mask method and coarse-to-fine cross-attention to capture robust cross-view features. As far as we know, this is the first work that introduced the outstanding top-down paradigm in BEV detection field to a trajectory prediction problem. The results of experiments on two publicly available datasets showed that CVTP3D achieved state-of-the-art performance with consistent cross-view predictions.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Agent-based and Multi-agent Systems -> MAS: Human-agent interaction
Computer Vision -> CV: Machine learning for vision
2511
Towards Incremental NER Data Augmentation via Syntactic-aware Insertion Transformer
Wenjun Ke, Zongkai Tian, Qi Liu, Peng Wang, Jinhua Gao, Rui Qi
[+] More
[-] Less
Named entity recognition (NER) aims to locate and classify named entities in natural language texts. Most existing high-performance NER models employ a supervised paradigm, which requires a large quantity of high-quality annotated data during training. In order to help NER models perform well in few-shot scenarios, data augmentation approaches attempt to build extra data by means of random editing or by using end-to-end generation with PLMs. However, these methods focus on only the fluency of generated sentences, ignoring the syntactic correlation between the new and raw sentences. Such uncorrelation also brings low diversity and inconsistent labeling of synthetic samples. To fill this gap, we present SAINT (Syntactic-Aware InsertioN Transformer), a hard-constraint controlled text generation model that incorporates syntactic information. The proposed method operates by inserting new tokens between existing entities in a parallel manner. During insertion procedure, new tokens will be added taking both semantic and syntactic factors into account. Hence the resulting sentence can retain the syntactic correctness with respect to the raw data. Experimental results on two benchmark datasets, i.e., Ontonotes and Wikiann, demonstrate the comparable performance of SAINT over the state-of-the-art baselines.
List of keywords
Natural Language Processing -> NLP: Language generation
Natural Language Processing -> NLP: Named entities
2512
Engineering an Efficient Approximate DNF-Counter
Mate Soos, Divesh Aggarwal, Sourav Chakraborty, Kuldeep S Meel, Maciej Obremski
[+] More
[-] Less
Model counting is a fundamental problem with many practical applications, including query evaluation in probabilistic databases and failure-probability estimation of networks. In this work, we focus on a variant of this problem where the underlying formula is expressed in Disjunctive Normal Form (DNF), also known as \#DNF. This problem has been shown to be \#P-complete, making it intractable to solve exactly. Much research has therefore been focused on obtaining approximate solutions, particularly in the form of $(\epsilon, \delta)$ approximations. The primary contribution of this paper is a new approach, called dnfstream, to approximate \#DNF counting that achieves (nearly) optimal time complexity and outperforms existing FPRAS. Our approach is based on the recent breakthrough in the context of union of sets in streaming. We demonstrate the effectiveness of our approach through extensive experiments and show that it provides an affirmative answer to the challenge of efficiently computing \#DNF.
List of keywords
Constraint Satisfaction and Optimization -> CSO: Solvers and tools
2529
Efficient Online Decision Tree Learning with Active Feature Acquisition
Arman Rahbar, Ziyu Ye, Yuxin Chen, Morteza Haghir Chehreghani
[+] More
[-] Less
Constructing decision trees online is a classical machine learning problem. Existing works often assume that features are readily available for each incoming data point. However, in many real world applications, both feature values and the labels are unknown a priori and can only be obtained at a cost. For example, in medical diagnosis, doctors have to choose which tests to perform (i.e., making costly feature queries) on a patient in order to make a diagnosis decision (i.e., predicting labels). We provide a fresh perspective to tackle this practical challenge. Our framework consists of an active planning oracle embedded in an online learning scheme for which we investigate several information acquisition functions. Specifically, we employ a surrogate information acquisition function based on adaptive submodularity to actively query feature values with a minimal cost, while using a posterior sampling scheme to maintain a low regret for online prediction. We demonstrate the efficiency and effectiveness of our framework via extensive experiments on various real-world datasets. Our framework also naturally adapts to the challenging setting of online learning with concept drift and is shown to be competitive with baseline models while being more flexible.
List of keywords
Machine Learning -> ML: Online learning
Data Mining -> DM: Mining data streams
Machine Learning -> ML: Active learning
2543
HOI-aware Adaptive Network for Weakly-supervised Action Segmentation
Runzhong Zhang, Suchen Wang, Yueqi Duan, Yansong Tang, Yue Zhang, Yap-Peng Tan
[+] More
[-] Less
In this paper, we propose an HOI-aware adaptive network named AdaAct for weakly-supervised action segmentation. Most existing methods learn a fixed network to predict the action of each frame with the neighboring frame features. However, this would result in ambiguity when estimating similar actions, such as pouring juice and pouring coffee. To address this, we aim to exploit temporally global but spatially local human-object interactions (HOI) as video-level prior knowledge for action segmentation. The long-term HOI sequence provides crucial contextual information to distinguish ambiguous actions, where our network dynamically adapts to the given HOI sequence at test time. More specifically, we first design a video HOI encoder that extracts, selects, and integrates the most representative HOI throughout the video. Then, we propose a two-branch HyperNetwork to learn an adaptive temporal encoder, which automatically adjusts the parameters based on the HOI information of various videos on the fly. Extensive experiments on two widely-used datasets including Breakfast and 50Salads demonstrate the effectiveness of our method under different evaluation metrics.
List of keywords
Computer Vision -> CV: Video analysis and understanding   
Computer Vision -> CV: Action and behavior recognition
2554
GeNAS: Neural Architecture Search with Better Generalization
Joonhyun Jeong, Joonsang Yu, Geondo Park, Dongyoon Han, Youngjoon Yoo
[+] More
[-] Less
Neural Architecture Search (NAS) aims to automatically excavate the optimal network architecture with superior test performance. Recent neural architecture search (NAS) approaches rely on the validation loss or accuracy to find the superior network for the target data. In this paper, we investigate a new neural architecture search measure for excavating architectures with better generalization. We demonstrate that the flatness of the loss surface can be a promising proxy for predicting the generalization capability of neural network architectures. We evaluate our proposed method on various search spaces, showing similar or even better performance compared to the state-of-the-art NAS methods. Notably, the resultant architecture found by flatness measure generalizes robustly with regard to various data distribution shift (e.g. ImageNet-V2,-A,-O), as well as various tasks such as object detection and semantic segmentation.
List of keywords
Computer Vision -> CV: Machine learning for vision
2562
Explainable Text Classification via Attentive and Targeted Mixing Data Augmentation
Songhao Jiang, Yan Chu, Zhengkui Wang, Tianxing Ma, 王 瀚麟, wenxuan lu, Tianning Zang, Wang Bo
[+] More
[-] Less
Mixing data augmentation methods have been widely used in text classification recently. However, existing methods do not control the quality of augmented data and have low model explainability. To tackle these issues, this paper proposes an explainable text classification solution based on attentive and targeted mixing data augmentation, ATMIX. Instead of selecting data for augmentation without control, ATMIX focuses on the misclassified training samples as the target for augmentation to better improve the model’s capability. Meanwhile, to generate meaningful augmented samples, it adopts a self-attention mechanism to understand the importance of the subsentences in a text, and cut and mix the subsentences between the misclassified and correctly classified samples wisely. Furthermore, it employs a novel dynamic augmented data selection framework based on the loss function gradient to dynamically optimize the augmented samples for model training. In the end, we develop a new model explainability evaluation method based on subsentence attention and conduct extensive evaluations over multiple real-world text datasets. The results indicate that ATMIX is more effective with higher explainability than the typical classification models, hidden-level, and input-level mixup models.
List of keywords
Natural Language Processing -> NLP: Text classification
Natural Language Processing -> NLP: Sentiment analysis, stylistic analysis, and argument mining
Natural Language Processing -> NLP: Tools
2576
Synthesizing Resilient Strategies for Infinite-Horizon Objectives in Multi-Agent Systems
David Klaska, Antonin Kucera, Martin Kurečka, Vit Musil, Petr Novotný, Vojtech Rehak
[+] More
[-] Less
We consider the problem of synthesizing resilient and stochastically stable strategies for systems of cooperating agents striving to minimize the expected time between consecutive visits to selected locations in a known environment. A strategy profile is resilient if it retains its functionality even if some of the agents fail, and stochastically stable if the visiting time variance is small. We design a novel specification language for objectives involving resilience and stochastic stability, and we show how to efficiently compute strategy profiles (for both autonomous and coordinated agents) optimizing these objectives. Our experiments show that our strategy synthesis algorithm can construct highly non-trivial and efficient strategy profiles for environments with general topology.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Planning and Scheduling -> PS: Robot planning
2577
An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations
Achille Fokoue, Ibrahim Abdelaziz, Maxwell Crouse, Shajith Ikbal, Akihiro Kishimoto, Guilherme Lima, Ndivhuwo Makondo, Radu Marinescu
[+] More
[-] Less
Using reinforcement learning for automated theorem proving has recently received much attention. Current approaches use representations of logical statements that often rely on the names used in these statements and, as a result, the models are generally not transferable from one domain to another. The size of these representations and whether to include the whole theory or part of it are other important decisions that affect the performance of these approaches as well as their runtime efficiency. In this paper, we present NIAGRA; an ensemble Name InvAriant Graph RepresentAtion. NIAGRA addresses this problem by using 1) improved Graph Neural Networks for learning name-invariant formula representations that is tailored for their unique characteristics and 2) an efficient ensemble approach for automated theorem proving. Our experimental evaluation shows state-of-the-art performance on multiple datasets from different domains with improvements up to 10% compared to the best learning-based approaches. Furthermore, transfer learning experiments show that our approach significantly outperforms other learning-based approaches by up to 28%.
List of keywords
Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving
Machine Learning -> ML: Applications
Machine Learning -> ML: Representation learning
2587
Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening
Mingsong Li, Yikun Liu, Tao Xiao, Yuwen Huang, Gongping Yang
[+] More
[-] Less
Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image. Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency. For one thing, the universally adopted black box principle limits the model interpretability. For another thing, existing DL-based methods fail to efficiently capture local and global dependencies at the same time, inevitably limiting the overall performance. To address these mentioned issues, we first formulate the degradation process of the high-resolution multispectral (HrMS) image as a unified variational optimization problem, and alternately solve its data and prior subproblems by the designed iterative proximal gradient descent (PGD) algorithm. Moreover, we customize a Local-Global Transformer (LGT) to simultaneously model local and global dependencies, and further formulate an LGT-based prior module for image denoising. Besides the prior module, we also design a lightweight data module. Finally, by serially integrating the data and prior modules in each iterative stage, we unfold the iterative algorithm into a stage-wise unfolding network, Local-Global Transformer Enhanced Unfolding Network (LGTEUN), for the interpretable MS pan-sharpening. Comprehensive experimental results on three satellite data sets demonstrate the effectiveness and efficiency of LGTEUN compared with state-of-the-art (SOTA) methods. The source code is available at https://github.com/lms-07/LGTEUN.
List of keywords
Computer Vision -> CV: Applications
Machine Learning -> ML: Applications
2590
DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning
Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li
[+] More
[-] Less
Communication lays the foundation for cooperation in human society and in multi-agent reinforcement learning (MARL). Humans also desire to maintain their privacy when communicating with others, yet such privacy concern has not been considered in existing works in MARL. We propose the \textit{differentially private multi-agent communication} (DPMAC) algorithm, which protects the sensitive information of individual agents by equipping each agent with a local message sender with rigorous $(\epsilon, \delta)$-differential privacy (DP) guarantee. In contrast to directly perturbing the messages with predefined DP noise as commonly done in privacy-preserving scenarios, we adopt a stochastic message sender for each agent respectively and incorporate the DP requirement into the sender, which automatically adjusts the learned message distribution to alleviate the instability caused by DP noise. Further, we prove the existence of a Nash equilibrium in cooperative MARL with privacy-preserving communication, which suggests that this problem is game-theoretically learnable. Extensive experiments demonstrate a clear advantage of DPMAC over baseline methods in privacy-preserving scenarios.
List of keywords
Machine Learning -> ML: Deep reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Agent communication
2607
A Generalized Deep Markov Random Fields Framework for Fake News Detection
Yiqi Dong, Di Jin, Xiaobao Wang, Yawen Li, Xiaowen Su, Dongxiao He
[+] More
[-] Less
Recently, the wanton dissemination of fake news on social media has adversely affected our lives, rendering automatic fake news detection a pressing issue. Current methods are often fully supervised and typically employ deep neural networks (DNN) to learn implicit relevance from labeled data, ignoring explicitly shared properties (e.g., inflammatory expressions) across fake news. To address this limitation, we propose a graph-theoretic framework, called Generalized Deep Markov Random Fields Framework (GDMRFF), that inherits the capability of deep learning while at the same time exploiting the correlations among the news articles (including labeled and unlabeled data). Specifically, we first leverage a DNN-based module to learn implicit relations, which we then reveal as the unary function of MRF. Pairwise functions with refining effects to encapsulate human insights are designed to capture the explicit association among all samples. Meanwhile, an event removal module is introduced to remove event impact on pairwise functions. Note that we train GDMRFF with the semi-supervised setting, which decreases the reliance on labeled data while maximizing the potential of unlabeled data. We further develop an Ambiguity Learning Guided MRF (ALGM) model as a concretization of GDMRFF. Experiments show that ALGM outperforms the compared methods significantly on two datasets, especially when labeled data is limited.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Web and social networks
Data Mining -> DM: Mining graphs
Natural Language Processing -> NLP: Text classification
2611
RuleMatch: Matching Abstract Rules for Semi-supervised Learning of Human Standard Intelligence Tests
Yunlong Xu, Lingxiao Yang, Hongzhi You, Zonglei Zhen, Da-Hui Wang, Xiaohong Wan, Xiaohua Xie, Ru-Yuan Zhang
[+] More
[-] Less
Raven’s Progressive Matrices (RPM), one of the standard intelligence tests in human psychology, has recently emerged as a powerful tool for studying abstract visual reasoning (AVR) abilities in machines. Although existing computational models for RPM problems achieve good performance, they require a large number of labeled training examples for supervised learning. In contrast, humans can efficiently solve unlabeled RPM problems after learning from only a few example questions. Here, we develop a semi-supervised learning (SSL) method, called RuleMatch, to train deep models with a small number of labeled RPM questions along with other unlabeled questions. Moreover, instead of using pixel-level augmentation in object perception tasks, we exploit the nature of RPM problems and augment the data at the level of abstract rules. Specifically, we disrupt the possible rules contained among context images in an RPM question and force the two augmented variants of the same unlabeled sample to obey the same abstract rule and predict a common pseudo label for training. Extensive experiments show that the proposed RuleMatch achieves state-of-the-art performance on two popular RAVEN datasets. Our work makes an important stride in aligning abstract analogical visual reasoning abilities in machines and humans. Our Code is at https://github.com/ZjjConan/AVR-RuleMatch.
List of keywords
Computer Vision -> CV: Visual reasoning and symbolic representation
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
2612
DFVSR: Directional Frequency Video Super-Resolution via Asymmetric and Enhancement Alignment Network
Shuting Dong, Feng Lu, Zhe Wu, Chun Yuan
[+] More
[-] Less
Frequency-based methods have recently received much attention due to their impressive restoration of detail and structure in video super-resolution. However, most of these frequency-based methods mainly have three major limitations: 1) insufficient exploration of object motion information, 2) inadequate enhancement for high-fidelity regions, and 3) loss of spatial information during convolution. In this paper, we proposed a novel network, Directional Frequency Video Super-Resolution (DFVSR), to address these limitations. Specifically, we reconsider object motion from a new perspective and propose Directional Frequency Representation (DFR), which not only borrows the property of frequency representation of detail and structure information but also contains the direction information of the object motion that is extremely significant in videos. Based on this representation, we proposed a Directional Frequency-Enhanced Alignment (DFEA) to use double enhancements of task-related information for ensuring the retention of high-fidelity frequency regions to generate the high-quality alignment feature. Furthermore, we design a novel Asymmetrical U-shaped network architecture to progressively fuse these alignment features and output the final output. This architecture enables the intercommunication of the same level of resolution in the encoder and decoder to achieve the supplement of spatial information. Powered by the above designs, our method achieves superior performance over state-of-the-art models on both quantitative and qualitative evaluations.
List of keywords
Computer Vision -> CV: Image and video retrieval 
Computer Vision -> CV: Other
2614
SS-BSN: Attentive Blind-Spot Network for Self-Supervised Denoising with Nonlocal Self-Similarity
Young-Joo Han, Ha-Jin Yu
[+] More
[-] Less
Recently, numerous studies have been conducted on supervised learning-based image denoising methods. However, these methods rely on large-scale noisy-clean image pairs, which are difficult to obtain in practice. Denoising methods with self-supervised training that can be trained with only noisy images have been proposed to address the limitation. These methods are based on the convolutional neural network (CNN) and have shown promising performance. However, CNN-based methods do not consider using nonlocal self-similarities essential in the traditional method, which can cause performance limitations. This paper presents self-similarity attention (SS-Attention), a novel self-attention module that can capture nonlocal self-similarities to solve the problem. We focus on designing a lightweight self-attention module in a pixel-wise manner, which is nearly impossible to implement using the classic self-attention module due to the quadratically increasing complexity with spatial resolution. Furthermore, we integrate SS-Attention into the blind-spot network called self-similarity-based blind-spot network (SS-BSN). We conduct the experiments on real-world image denoising tasks. The proposed method quantitatively and qualitatively outperforms state-of-the-art methods in self-supervised denoising on the Smartphone Image Denoising Dataset (SIDD) and Darmstadt Noise Dataset (DND) benchmark datasets.
List of keywords
Computer Vision -> CV: Computational photography
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Self-supervised Learning
2619
StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset
Chaofan Huo, Ye Shi, Yuexin Ma, Lan Xu, Jingyi Yu, Jingya Wang
[+] More
[-] Less
Modeling and capturing the 3D spatial arrangement of the human and the object is the key to perceiving 3D human-object interaction from monocular images. In this work, we propose to use Human-Object Offset between anchors which are densely sampled from the surface of human mesh and object mesh to represent human-object spatial relation. Compared with previous works which use contact map or implicit distance filed to encode 3D human-object spatial relations, our method is a simple and efficient way to encode the highly detailed correlation between the human and object. Based on this representation, we propose Stacked Normalizing Flow (StackFLOW) to infer the posteriori distribution of human-object spatial relations from the image. During the optimization stage, we finetune the human body pose and object 6D pose by maximizing the likelihood of samples based on this posteriori distribution and minimizing the 2D-3D corresponding reprojection loss. Extensive experimental results show that our method significantly outperforms the SOTA with two challenging benchmarks, BEHAVE and InterCap datasets.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Action and behavior recognition
Computer Vision -> CV: Biometrics, face, gesture and pose recognition
2638
Prediction with Incomplete Data under Agnostic Mask Distribution Shift
Yichen Zhu, Jian Yuan, Bo Jiang, Tao Lin, Haiming Jin, Xinbing Wang, Chenghu Zhou
[+] More
[-] Less
Data with missing values is ubiquitous in many applications. Recent years have witnessed increasing attention on prediction with only incomplete data consisting of observed features and a mask that indicates the missing pattern. Existing methods assume that the training and testing distributions are the same, which may be violated in real-world scenarios. In this paper, we consider prediction with incomplete data in the presence of distribution shift. We focus on the case where the underlying joint distribution of complete features and label is invariant, but the missing pattern, i.e., mask distribution may shift agnostically between training and testing. To achieve generalization, we leverage the observation that for each mask, there is an invariant optimal predictor. To avoid the exponential explosion when learning them separately, we approximate the optimal predictors jointly using a double parameterization technique. This has the undesirable side effect of allowing the learned predictors to rely on the intra-mask correlation and that between features and mask. We perform decorrelation to minimize this effect. Combining the techniques above, we propose a novel prediction method called StableMiss. Extensive experiments on both synthetic and real-world datasets show that StableMiss is robust and outperforms state-of-the-art methods under agnostic mask distribution shift.
List of keywords
Machine Learning -> ML: Multi-task and transfer learning
2653
Decentralized Anomaly Detection in Cooperative Multi-Agent Reinforcement Learning
Kiarash Kazari, Ezzeldin Shereen, Gyorgy Dan
[+] More
[-] Less
We consider the problem of detecting adversarial attacks against cooperative multi-agent reinforcement learning. We propose a decentralized scheme that allows agents to detect the abnormal behavior of one compromised agent. Our approach is based on a recurrent neural network (RNN) trained during cooperative learning to predict the action distribution of other agents based on local observations. The predicted distribution is used for computing a normality score for the agents, which allows the detection of the misbehavior of other agents. To explore the robustness of the proposed detection scheme, we formulate the worst-case attack against our scheme as a constrained reinforcement learning problem. We propose to compute an attack policy by optimizing the corresponding dual function using reinforcement learning. Extensive simulations on various multi-agent benchmarks show the effectiveness of the proposed detection scheme in detecting state-of-the-art attacks and in limiting the impact of undetectable attacks.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
2666
Deliberation and Voting in Approval-Based Multi-Winner Elections
Kanav Mehra, Nanda Kishore Sreenivas, Kate Larson
[+] More
[-] Less
Citizen-focused democratic processes where participants deliberate on alternatives and then vote to make the final decision are increasingly popular today. While the computational social choice literature has extensively investigated voting rules, there is limited work that explicitly looks at the interplay of the deliberative process and voting. In this paper, we build a deliberation model using established models from the opinion-dynamics literature and study the effect of different deliberation mechanisms on voting outcomes achieved when using well-studied voting rules. Our results show that deliberation generally improves welfare and representation guarantees, but the results are sensitive to how the deliberation process is organized. We also show, experimentally, that simple voting rules, such as approval voting, perform as well as more sophisticated rules such as proportional approval voting or method of equal shares if deliberation is properly supported. This has ramifications on the practical use of such voting rules in citizen-focused democratic processes.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Computational social choice
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
2670
FedPass: Privacy-Preserving Vertical Federated Deep Learning with Adaptive Obfuscation
Hanlin Gu, Jiahuan Luo, Yan Kang, Lixin Fan, Qiang Yang
[+] More
[-] Less
Vertical federated learning (VFL) allows an active party with labeled data to leverage auxiliary features from the passive parties to improve model performance. Concerns about the private feature and label leakage in both the training and inference phases of VFL have drawn wide research attention. In this paper, we propose a general privacy-preserving vertical federated deep learning framework called FedPass, which leverages adaptive obfuscation to protect the feature and label simultaneously. Strong privacy-preserving capabilities about private features and labels are theoretically proved (in Theorems 1 and 2). Extensive experimental results with different datasets and network architectures also justify the superiority of FedPass against existing methods in light of its near-optimal trade-off between privacy and model performance.
List of keywords
Machine Learning -> ML: Federated learning
Computer Vision -> CV: Bias, fairness and privacy
2671
Contrastive Label Enhancement
Yifei Wang, Yiyang Zhou, Jihua Zhu, Xinyuan Liu, Wenbiao Yan, Zhiqiang Tian
[+] More
[-] Less
Label distribution learning (LDL) is a new machine learning paradigm for solving label ambiguity. Since it is difficult to directly obtain label distributions, many studies are focusing on how to recover label distributions from logical labels, dubbed label enhancement (LE). Existing LE methods estimate label distributions by simply building a mapping relationship between features and label distributions under the supervision of logical labels. They typically overlook the fact that both features and logical labels are descriptions of the instance from different views. Therefore, we propose a novel method called Contrastive Label Enhancement (ConLE) which integrates features and logical labels into the unified projection space to generate high-level features by contrastive learning strategy. In this approach, features and logical labels belonging to the same sample are pulled closer, while those of different samples are projected farther away from each other in the projection space. Subsequently, we leverage the obtained high-level features to gain label distributions through a well-designed training strategy that considers the consistency of label attributes. Extensive experiments on LDL benchmark datasets demonstrate the effectiveness and superiority of our method.
List of keywords
Machine Learning -> ML: Multi-label
Machine Learning -> ML: Multi-view learning
2676
SAT-Based PAC Learning of Description Logic Concepts
Balder Ten Cate, Maurice Funk, Jean Jung, Carsten Lutz
[+] More
[-] Less
We propose bounded fitting as a scheme for learning description logic concepts in the presence of ontologies. A main advantage is that the resulting learning algorithms come with theoretical guarantees regarding their generalization to unseen examples in the sense of PAC learning. We prove that, in contrast, several other natural learning algorithms fail to provide such guarantees. As a further contribution, we present the system SPELL which efficiently implements bounded fitting for the description logic ELHr based on a SAT solver, and compare its performance to a state-of-the-art learner.
List of keywords
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
2692
A Novel Demand Response Model and Method for Peak Reduction in Smart Grids — PowerTAC
Sanjay Chandlekar, Shweta Jain, Sujit Gujar
[+] More
[-] Less
One of the widely used peak reduction methods in smart grids is demand response, where one analyzes the shift in customers’ (agents’) usage patterns in response to the signal from the distribution company. Often, these signals are in the form of incentives offered to agents. This work studies the effect of incentives on the probabilities of accepting such offers in a real-world smart grid simulator, PowerTAC. We first show that there exists a function that depicts the probability of an agent reducing its load as a function of the discounts offered to them. We call it reduction probability (RP). RP function is further parametrized by the rate of reduction (RR), which can differ for each agent. We provide an optimal algorithm, MJS–ExpResponse, that outputs the discounts to each agent by maximizing the expected reduction under a budget constraint. When RRs are unknown, we propose a Multi-Armed Bandit (MAB) based online algorithm, namely MJSUCB–ExpResponse, to learn RRs. Experimentally we show that it exhibits sublinear regret. Finally, we showcase the efficacy of the proposed algorithm in mitigating demand peaks in a real-world smart grid system using the PowerTAC simulator as a test bed.
List of keywords
Machine Learning -> ML: Applications
Multidisciplinary Topics and Applications -> MDA: Energy, environment and sustainability
2693
MolHF: A Hierarchical Normalizing Flow for Molecular Graph Generation
Yiheng Zhu, Zhenqiu Ouyang, Ben Liao, Jialu Wu, Yixuan Wu, Chang-Yu Hsieh, Tingjun Hou, Jian Wu
[+] More
[-] Less
Molecular de novo design is a critical yet challenging task in scientific fields, aiming to design novel molecular structures with desired property profiles. Significant progress has been made by resorting to generative models for graphs. However, limited attention is paid to hierarchical generative models, which can exploit the inherent hierarchical structure (with rich semantic information) of the molecular graphs and generate complex molecules of larger size that we shall demonstrate to be difficult for most existing models. The primary challenge to hierarchical generation is the non-differentiable issue caused by the generation of intermediate discrete coarsened graph structures. To sidestep this issue, we cast the tricky hierarchical generation problem over discrete spaces as the reverse process of hierarchical representation learning and propose MolHF, a new hierarchical flow-based model that generates molecular graphs in a coarse-to-fine manner. Specifically, MolHF first generates bonds through a multi-scale architecture, then generates atoms based on the coarsened graph structure at each scale. We demonstrate that MolHF achieves state-of-the-art performance in random generation and property optimization, implying its high capacity to model data distribution. Furthermore, MolHF is the first flow-based model that can be applied to model larger molecules (polymer) with more than 100 heavy atoms. The code and models are available at https://github.com/violet-sto/MolHF.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Health and medicine
Machine Learning -> ML: Probabilistic machine learning
Machine Learning -> ML: Sequence and graph learning
2705
An Empirical Study on the Language Modal in Visual Question Answering
Daowan Peng, Wei Wei, Xian-Ling Mao, Yuanyuan Fu, Dangyang Chen
[+] More
[-] Less
Generalization beyond experience-based on out-of-distribution data is of great significance in AI domain. Of late, the state-of-the-art Visual Question Answering (VQA) models could perform well on in-domain data partly due to the language prior, which, however, would limit their generalizability in real-world situations. It is widely known that the bias-dependency issue is the culprit of such dilemmas. This paper analyzes the problem of language bias from a new perspective and aims to obtain more information about the issue through empirical analysis. We found, specifically, that 1) postfix that were more essential than question-type in causing bias issue; 2) word-sequence-related disturbance of the question would lead to VQA model’s inability to learn fixed pattern in the original question, hence reducing bias reliance learning. The experimental results show that almost all test models improved their performance when trained with disturbed question on VQA-CPv2, the LXMERT even achieves a 10-point gain without adopting any de-bias methods. Additionally, we propose a data enhancement method, and conduct extensive experiments. The experimental results show that the proposed method can improve the performance on out-of-distribution benchmark. We hope this study inspires novel insights for future research on designing bias-reduction approaches.
List of keywords
Machine Learning -> ML: Multi-modal learning
Natural Language Processing -> NLP: Question answering
2716
LSGNN: Towards General Graph Neural Network in Node Classification by Local Similarity
Yuhan Chen, Yihong Luo, Jing Tang, Liang Yang, Siya Qiu, Chuan Wang, Xiaochun Cao
[+] More
[-] Less
Heterophily has been considered as an issue that hurts the performance of Graph Neural Networks (GNNs). To address this issue, some existing work uses a graph-level weighted fusion of the information of multi-hop neighbors to include more nodes with homophily. However, the heterophily might differ among nodes, which requires to consider the local topology. Motivated by it, we propose to use the local similarity (LocalSim) to learn node-level weighted fusion, which can also serve as a plug-and-play module. For better fusion, we propose a novel and efficient Initial Residual Difference Connection (IRDC) to extract more informative multi-hop information. Moreover, we provide theoretical analysis on the effectiveness of LocalSim representing node homophily on synthetic graphs. Extensive evaluations over real benchmark datasets show that our proposed method, namely Local Similarity Graph Neural Network (LSGNN), can offer comparable or superior state-of-the-art performance on both homophilic and heterophilic graphs. Meanwhile, the plug-and-play model can significantly boost the performance of existing GNNs.
List of keywords
Machine Learning -> ML: Sequence and graph learning
Data Mining -> DM: Mining graphs
2727
Hierarchical Semantic Contrast for Weakly Supervised Semantic Segmentation
Yuanchen Wu, Xiaoqiang Li, Songmin Dai, Jide Li, Tong Liu, Shaorong Xie
[+] More
[-] Less
Weakly supervised semantic segmentation (WSSS) with image-level annotations has achieved great processes through class activation map (CAM). Since vanilla CAMs are hardly served as guidance to bridge the gap between full and weak supervision, recent studies explore semantic representations to make CAM fit for WSSS and demonstrate encouraging results. However, they generally exploit single-level semantics, which may hamper the model to learn a comprehensive semantic structure. Motivated by the prior that each image has multiple levels of semantics, we propose hierarchical semantic contrast (HSC) to ameliorate the above problem. It conducts semantic contrast from coarse-grained to fine-grained perspective, including ROI level, class level, and pixel level, making the model learn a better object pattern understanding. To further improve CAM quality, building upon HSC, we explore consistency regularization of cross supervision and develop momentum prototype learning to utilize abundant semantics across different images. Extensive studies manifest that our plug-and-play learning paradigm, HSC, can significantly boost CAM quality on both non-saliency-guided and saliency-guided baselines, and establish new state-of-the-art WSSS performance on PASCAL VOC 2012 dataset. Code is available at https://github.com/Wu0409/HSC_WSSS.
List of keywords
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Representation learning
Computer Vision -> CV: Scene analysis and understanding   
2734
Open Anomalous Trajectory Recognition via Probabilistic Metric Learning
Qiang Gao, Xiaohan Wang, Chaoran Liu, Goce Trajcevski, Li Huang, Fan Zhou
[+] More
[-] Less
Identifying anomalous trajectories appropriate responses to a variety of unusual traffic behaviors or driving patterns. Due to the limit of closed-set context, existing approaches fail to recognize the unknown anomalous trajectories, resulting in an insufficient self-motivated learning paradigm. We investigate the novel Anomalous Trajectory Recognition problem in an Open-world scenario (ATRO) and introduce a novel probabilistic Metric learning model, namely ATROM, to address it. Specifically, ATROM can detect the presence of unknown anomalous behavior in addition to identifying known behavior. It has a Mutual Interaction Distillation that uses contrastive metric learning to explore the interactive semantics regarding the diverse behavioral intents and a Probabilistic Trajectory Embedding that forces the trajectories with distinct behaviors to follow different Gaussian priors. More importantly, ATROM offers a probabilistic metric rule to discriminate between known and unknown behavioral patterns by taking advantage of the approximation of multiple priors. Experimental results on two large-scale trajectory datasets demonstrate the superiority of ATROM in addressing both known and unknown anomalous patterns.
List of keywords
Data Mining -> DM: Mining spatial and/or temporal data
Data Mining -> DM: Applications
Multidisciplinary Topics and Applications -> MDA: Transportation
2738
Boosting Decision-Based Black-Box Adversarial Attack with Gradient Priors
Han Liu, Xingshuo Huang, Xiaotong Zhang, Qimai Li, Fenglong Ma, Wei Wang, Hongyang Chen, Hong Yu, Xianchao Zhang
[+] More
[-] Less
Decision-based methods have shown to be effective in black-box adversarial attacks, as they can obtain satisfactory performance and only require to access the final model prediction. Gradient estimation is a critical step in black-box adversarial attacks, as it will directly affect the query efficiency. Recent works have attempted to utilize gradient priors to facilitate score-based methods to obtain better results. However, these gradient priors still suffer from the edge gradient discrepancy issue and the successive iteration gradient direction issue, thus are difficult to simply extend to decision-based methods. In this paper, we propose a novel Decision-based Black-box Attack framework with Gradient Priors (DBA-GP), which seamlessly integrates the data-dependent gradient prior and time-dependent prior into the gradient estimation procedure. First, by leveraging the joint bilateral filter to deal with each random perturbation, DBA-GP can guarantee that the generated perturbations in edge locations are hardly smoothed, i.e., alleviating the edge gradient discrepancy, thus remaining the characteristics of the original image as much as possible. Second, by utilizing a new gradient updating strategy to automatically adjust the successive iteration gradient direction, DBA-GP can accelerate the convergence speed, thus improving the query efficiency. Extensive experiments have demonstrated that the proposed method outperforms other strong baselines significantly.
List of keywords
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
2748
VS-Boost: Boosting Visual-Semantic Association for Generalized Zero-Shot Learning
Xiaofan Li, Yachao Zhang, Shiran Bian, Yanyun Qu, Yuan Xie, Jianping Fan, zhongchao shi
[+] More
[-] Less
Unlike conventional zero-shot learning (CZSL) which only focuses on the recognition of unseen classes by using the classifier trained on seen classes and semantic embeddings, generalized zero-shot learning (GZSL) aims at recognizing both the seen and unseen classes, so it is more challenging due to the extreme training imbalance. Recently, some feature generation methods introduce metric learning to enhance the discriminability of visual features. Although these methods achieve good results, they focus only on metric learning in the visual feature space to enhance features and ignore the association between the feature space and the semantic space. Since the GZSL method uses semantics as prior knowledge to migrate visual knowledge to unseen classes, the consistency between visual space and semantic space is critical. To this end, we propose relational metric learning which can relate the metrics in the two spaces and make the distribution of the two spaces more consistent. Based on the generation method and relational metric learning, we proposed a novel GZSL method, termed VS-Boost, which can effectively boost the association between vision and semantics. The experimental results demonstrate that our method is effective and achieves significant gains on five benchmark datasets compared with the state-of-the-art methods.
List of keywords
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
Computer Vision -> CV: Recognition (object detection, categorization)
2753
Low-Confidence Samples Mining for Semi-supervised Object Detection
Guandu Liu, Fangyuan Zhang, Tianxiang Pan, Jun-Hai Yong, Bin Wang
[+] More
[-] Less
Reliable pseudo labels from unlabeled data play a key role in semi-supervised object detection (SSOD). However, the state-of-the-art SSOD methods all rely on pseudo labels with high confidence, which ignore valuable pseudo labels with lower confidence. Additionally, the insufficient excavation for unlabeled data results in an excessively low recall rate thus hurting the network training. In this paper, we propose a novel Low-confidence Samples Mining (LSM) method to utilize low confidence pseudo labels efficiently. Specifically, we develop an additional pseudo information mining (PIM) branch on account of low-resolution feature maps to extract reliable large area instances, the IoUs of which are higher than small area ones. Owing to the complementary predictions between PIM and the main branch, we further design self-distillation (SD) to compensate for both in a mutually learning manner. Meanwhile, the extensibility of the above approaches enables our LSM to apply to Faster-RCNN and Deformable-DETR respectively. On the MS-COCO benchmark, our method achieves 3.54% mAP improvement over state-of-the-art methods under 5% labeling ratios.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Data Mining -> DM: Applications
Data Mining -> DM: Exploratory data mining
2754
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, Bin Cui
[+] More
[-] Less
Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the distributed training of ultra-large models. However, directly deploying these systems often leads to sub-optimal training efficiency due to the complex model architectures and the strict device memory constraints. In this paper, we propose Optimal Sharded Data Parallel (OSDP), an automated parallel training system that combines the advantages from both data and model parallelism. Given the model description and the device information, OSDP makes trade-offs between the memory consumption and the hardware utilization, thus automatically generates the distributed computation graph and maximizes the overall system throughput. In addition, OSDP introduces operator splitting to further alleviate peak memory footprints during training with negligible overheads, which enables the trainability of larger models as well as the higher throughput. Extensive experimental results of OSDP on multiple different kinds of large-scale models demonstrate that the proposed strategy outperforms the state-of-the-art in multiple regards.
List of keywords
Data Mining -> DM: Parallel, distributed and cloud-based high performance mining
Data Mining -> DM: Big data and scalability
2757
Bayesian Optimization with Switching Cost: Regret Analysis and Lookahead Variants
Peng Liu, Haowei Wang, Wei Qiyu
[+] More
[-] Less
Recently, Bayesian Optimization (BO) has received increasing attention due to its efficiency in optimizing expensive-to-evaluate functions. For some practical problems, it is essential to consider the switching cost between consecutive sampling locations given a total traveling budget. For example, when using a drone to locate cracks in a building wall or search for lost survivors in the wild, the search path needs to be efficiently planned given the limited battery power of the drone. Tackling such problems requires a careful cost-benefit analysis of candidate locations and keeping a balance between exploration and exploitation. In this work, we formulate such a problem as a constrained Markov Decision Process (MDP) and solve it by proposing a new distance-adjusted multi-step look-ahead acquisition function, the distUCB, and using rollout approximation. We also provide a theoretical regret analysis of the distUCB-based Bayesian optimization algorithm. In addition, the empirical performance of the proposed algorithm is tested based on both synthetic and real data experiments, and it shows that our cost-aware non-myopic algorithm performs better than other popular alternatives.
List of keywords
Machine Learning -> ML: Bayesian learning
Machine Learning -> ML: Hyperparameter optimization
2758
Learning Survival Distribution with Implicit Survival Function
Yu Ling, Weimin Tan, Bo Yan
[+] More
[-] Less
Survival analysis aims at modeling the relationship between covariates and event occurrence with some untracked (censored) samples. In implementation, existing methods model the survival distribution with strong assumptions or in a discrete time space for likelihood estimation with censorship, which leads to weak generalization. In this paper, we propose Implicit Survival Function (ISF) based on Implicit Neural Representation for survival distribution estimation without strong assumptions, and employ numerical integration to approximate the cumulative distribution function for prediction and optimization. Experimental results show that ISF outperforms the state-of-the-art methods in three public datasets and has robustness to the hyperparameter controlling estimation precision.
List of keywords
Machine Learning -> ML: Other
2759
Deep Symbolic Learning: Discovering Symbols and Rules from Perceptions
Alessandro Daniele, Tommaso Campari, Sagar Malhotra, Luciano Serafini
[+] More
[-] Less
Neuro-Symbolic (NeSy) integration combines symbolic reasoning with Neural Networks (NNs) for tasks requiring perception and reasoning. Most NeSy systems rely on continuous relaxation of logical knowledge, and no discrete decisions are made within the model pipeline. Furthermore, these methods assume that the symbolic rules are given. In this paper, we propose Deep Symboilic Learning (DSL), a NeSy system that learns \emph{NeSy-functions}, i.e., the composition of a (set of) perception functions which map continuous data to discrete symbols, and a symbolic function over the set of symbols. DSL simultaneously learns the perception and symbolic functions while being trained only on their composition (NeSy-function). The key novelty of DSL is that it can create internal (interpretable) symbolic representations and map them to perception inputs within a differentiable NN learning pipeline. The created symbols are automatically selected to generate symbolic functions that best explain the data. We provide experimental analysis to substantiate the efficacy of DSL in simultaneously learning perception and symbolic functions.
List of keywords
Machine Learning -> ML: Neuro-symbolic methods
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Explainable/Interpretable machine learning
2774
Approximating Fair Division on D-Claw-Free Graphs
Zbigniew Lonc
[+] More
[-] Less
We study the problem of fair allocation of indivisible goods that form a graph and the bundles that are distributed to agents are connected subgraphs of this graph. We focus on the maximin share and the proportional fairness criteria. It is well-known that allocations satisfying these criteria may not exist for many graphs including complete graphs and cycles. Therefore, it is natural to look for approximate allocations, i.e., allocations guaranteeing each agent a certain portion of the value that is satisfactory to her. In this paper we consider the class of graphs of goods which do not contain a star with d+1 edges (where d > 1) as an induced subgraph. For this class of graphs we prove that there is an allocation assigning each agent a connected bundle of value at least 1/d of her maximin share. Moreover, for the same class of graphs of goods, we show a theorem which specifies what fraction of the proportional share can be guaranteed to each agent if the values of single goods for the agents are bounded by a given fraction of this share.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Fair division
Agent-based and Multi-agent Systems -> MAS: Resource allocation
2777
Learning Gaussian Mixture Representations for Tensor Time Series Forecasting
Jiewen Deng, Renhe Jiang, Jinliang Deng, Xuan Song
[+] More
[-] Less
Tensor time series (TTS) data, a generalization of one-dimensional time series on a high-dimensional space, is ubiquitous in real-world scenarios, especially in monitoring systems involving multi-source spatio-temporal data (e.g., transportation demands and air pollutants). Compared to modeling time series or multivariate time series, which has received much attention and achieved tremendous progress in recent years, tensor time series has been paid less effort. Properly coping with the tensor time series is a much more challenging task, due to its high-dimensional and complex inner structure. In this paper, we develop a novel TTS forecasting framework, which seeks to individually model each heterogeneity component implied in the time, the location, and the source variables. We name this framework as GMRL, short for Gaussian Mixture Representation Learning. Experiment results on two real-world TTS datasets verify the superiority of our approach compared with the state-of-the-art baselines. Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle the heterogeneity components with different evolutions.
List of keywords
Data Mining -> DM: Mining spatial and/or temporal data
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
Machine Learning -> ML: Time series and data streams
2780
Globally Consistent Federated Graph Autoencoder for Non-IID Graphs
Kun Guo, Yutong Fang, Qingqing Huang, Yuting Liang, Ziyao Zhang, Wenyu He, Liu Yang, Kai Chen, Ximeng Liu, Wenzhong Guo
[+] More
[-] Less
Graph neural networks (GNNs) have been applied successfully in many machine learning tasks due to their advantages in utilizing neighboring information. Recently, with the global enactment of privacy protection regulations, federated GNNs have gained increasing attention in academia and industry. However, the graphs owned by different participants could be non-independently-and-identically distributed (non-IID), leading to the deterioration of federated GNNs’ accuracy. In this paper, we propose a globally consistent federated graph autoencoder (GCFGAE) to overcome the non-IID problem in unsupervised federated graph learning via three innovations. First, by integrating federated learning with split learning, we train a unique global model instead of FedAvg-styled global and local models, yielding results consistent with that of the centralized GAE. Second, we design a collaborative computation mechanism considering overlapping vertices to reduce communication overhead during forward propagation. Third, we develop a layer-wise and block-wise gradient computation strategy to reduce the space and communication complexity during backward propagation. Experiments on real-world datasets demonstrate that GCFGAE achieves not only higher accuracy but also around 500 times lower communication overhead and 1000 times smaller space overhead than existing federated GNN models.
List of keywords
Machine Learning -> ML: Federated learning
Data Mining -> DM: Mining graphs
2788
Video Object Segmentation in Panoptic Scenes
Yuanyou Xu, Zongxin Yang, Yi Yang
[+] More
[-] Less
In this paper, we introduce video object segmentation (VOS) to panoptic scenes and present a large-scale benchmark as well as a baseline method for it. Previous benchmarks for VOS with sparse annotations are not sufficient to train or evaluate a model that needs to process all possible objects in real-world scenarios. Our new benchmark (VIPOSeg) contains exhaustive object annotations and covers various real-world object categories which are carefully divided into subsets of thing/stuff and seen/unseen classes for comprehensive evaluation. Considering tracking and segmenting numerous dense objects in panoptic scenes are more challenging than processing sparse objects, we propose a strong baseline method named panoptic object association with transformers (PAOT). A pyramid architecture and an efficient transformer structure are proposed for multi-scale object matching. In addition, panoptic identification embeddings are generated by decoupled identity banks for thing and stuff objects for panoptic object association. Experimental results show that VIPOSeg can not only boost the performance of VOS models by panoptic training but also evaluate them comprehensively in panoptic scenes. The evaluation results show that previous methods for generic VOS still need to improve in performance and efficiency when dealing with panoptic scenes, while our PAOT method achieves SOTA performance with good efficiency on both VIPOSeg and previous VOS benchmarks.
List of keywords
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Video analysis and understanding   
2789
One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER
Xiang Chen, Lei Li, Shuofei Qiao, Ningyu Zhang, Chuanqi Tan, Yong Jiang, Fei Huang, Huajun Chen
[+] More
[-] Less
Cross-domain NER is a challenging task to address the low-resource problem in practical scenarios. Previous typical solutions mainly obtain a NER model by pre-trained language models (PLMs) with data from a rich-resource domain and adapt it to the target domain. Owing to the mismatch issue among entity types in different domains, previous approaches normally tune all parameters of PLMs, ending up with an entirely new NER model for each domain. Moreover, current models only focus on leveraging knowledge in one general source domain while failing to successfully transfer knowledge from multiple sources to the target. To address these issues, we introduce Collaborative Domain-Prefix Tuning for cross-domain NER (CP-NER) based on text-to-text generative PLMs. Specifically, we present text-to-text generation grounding domain-related instructors to transfer knowledge to new domain NER tasks without structural modifications. We utilize frozen PLMs and conduct collaborative domain-prefix tuning to stimulate the potential of PLMs to handle NER tasks across various domains. Experimental results on the Cross-NER benchmark show that the proposed approach has flexible transfer ability and performs better on both one-source and multiple-source cross-domain NER tasks.
List of keywords
Natural Language Processing -> NLP: Information extraction
Natural Language Processing -> NLP: Named entities
2793
Revisiting the Evaluation of Deep Learning-Based Compiler Testing
Yongqiang Tian, Zhenyang Xu, Yiwen Dong, Chengnian Sun, Shing-Chi CHEUNG
[+] More
[-] Less
A high-quality program generator is essential to effective automated compiler testing. Engineering such a program generator is difficult, timeconsuming, and specific to the language under testing, thus requiring tremendous efforts from human experts with language-specific domain knowledge. To avoid repeatedly writing program generators for different languages, researchers recently proposed a language-agnostic approach based on deep learning techniques to automatically learn a program generator (referred to as DLG) from existing programs. Evaluations show that DLGs outperform Language-Specific Program Generators (LSGs) in testing compilers. However, we argue that it is unfair to use LSGs as baselines to evaluate DLGs. LSGs aim to validate compiler optimizations by only generating compilable, well-defined test programs; this restriction inevitably impairs the diversity of the language features used in the generated programs. In contrast, DLGs do not aim to validate the correctness of compiler optimizations, and its generated programs are not guaranteed to be well-defined or even compilable. Therefore, it is not surprising that DLG generated programs are more diverse in terms of used language features than LSG-generated ones. This study revisits the evaluation of DLGs, and proposes a new, fair, simple yet strong baseline named Kitten for evaluating DLGs. Given a dataset consisting of human-written programs, instead of using deep learning techniques to learn a program generator, Kitten directly derives new programs by mutating the programs in the dataset. Extensive experiments with more than 1,500 CPU-hours demonstrate that the state-of-the-art DLGs fail to compete against such a simple baseline: 3 v.s. 1,750 hang bugs, 1 v.s. 34 distinct compiler crashes. We believe that DLGs still have a large room for improvement.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Software engineering
2815
Pyramid Diffusion Models For Low-light Image Enhancement
Dewei Zhou, Zongxin Yang, Yi Yang
[+] More
[-] Less
Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement. Recent diffusion models show realistic and detailed image generation through a sequence of denoising refinements and motivate us to introduce them to low-light image enhancement for recovering realistic details. However, we found two problems when doing this, i.e., 1) diffusion models keep constant resolution in one reverse process, which limits the speed; 2) diffusion models sometimes result in global degradation (e.g., RGB shift). To address the above problems, this paper proposes a Pyramid Diffusion model (PyDiff) for low-light image enhancement. PyDiff uses a novel pyramid diffusion method to perform sampling in a pyramid resolution style (i.e., progressively increasing resolution in one reverse process). Pyramid diffusion makes PyDiff much faster than vanilla diffusion models and introduces no performance degradation. Furthermore, PyDiff uses a global corrector to alleviate the global degradation that may occur in the reverse process, significantly improving the performance and making the training of diffusion models easier with little additional computational consumption. Extensive experiments on popular benchmarks show that PyDiff achieves superior performance and efficiency. Moreover, PyDiff can generalize well to unseen noise and illumination distributions.
List of keywords
Computer Vision -> CV: Computational photography
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
2816
Genetic Prompt Search via Exploiting Language Model Probabilities
Jiangjiang Zhao, Zhuoran Wang, Fangchun Yang
[+] More
[-] Less
Prompt tuning for large-scale pretrained language models (PLMs) has shown remarkable potential, especially in low-resource scenarios such as few-shot learning. Moreover, derivative-free optimisation (DFO) techniques make it possible to tune prompts for a black-box PLM to better fit downstream tasks. However, there are usually preconditions to apply existing DFO-based prompt tuning methods, e.g. the backbone PLM needs to provide extra APIs so that hidden states (and/or embedding vectors) can be injected into it as continuous prompts, or carefully designed (discrete) manual prompts need to be available beforehand, serving as the initial states of the tuning algorithm. To waive such preconditions and make DFO-based prompt tuning ready for general use, this paper introduces a novel genetic algorithm (GA) that evolves from empty prompts, and uses the predictive probabilities derived from the backbone PLM(s) on the basis of a (few-shot) training set to guide the token selection process during prompt mutations. Experimental results on diverse benchmark datasets show that the proposed precondition-free method significantly outperforms the existing DFO-style counterparts that require preconditions, including black-box tuning, genetic prompt search and gradient-free instructional prompt search.
List of keywords
Natural Language Processing -> NLP: Language models
Machine Learning -> ML: Few-shot learning
Natural Language Processing -> NLP: Other
2836
The Hardness of Reasoning about Probabilities and Causality
Benito van der Zander, Markus Bläser, Maciej Liskiewicz
[+] More
[-] Less
We study formal languages which are capable of fully expressing quantitative probabilistic reasoning and do-calculus reasoning for causal effects, from a computational complexity perspective. Our main focus is on the satisfiability problems whose instance formulas allow expressing many tasks in probabilistic and causal inference. The main contribution of this work is establishing the exact computational complexity of these satisfiability problems. We introduce a new natural complexity class, named succETR, which can be viewed as a succinct variant of the well-studied class ∃R, and show that the problems are complete for succETR. Our results imply even stronger algorithmic limitations than were proven by Fagin, Halpern, and Megiddo (1990) and Mossé, Ibeling, and Icard (2022) for some variants of the standard languages used commonly in probabilistic and causal inference.
List of keywords
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
Knowledge Representation and Reasoning -> KRR: Causality
Machine Learning -> ML: Causality
2840
Part Aware Contrastive Learning for Self-Supervised Action Recognition
Yilei Hua, Wenhan Wu, Ce Zheng, Aidong Lu, Mengyuan Liu, Chen Chen, Shiqian Wu
[+] More
[-] Less
In recent years, remarkable results have been achieved in self-supervised action recognition using skeleton sequences with contrastive learning. It has been observed that the semantic distinction of human action features is often represented by local body parts, such as legs or hands, which are advantageous for skeleton-based action recognition. This paper proposes an attention-based contrastive learning framework for skeleton representation learning, called SkeAttnCLR, which integrates local similarity and global features for skeleton-based action representations. To achieve this, a multi-head attention mask module is employed to learn the soft attention mask features from the skeletons, suppressing non-salient local features while accentuating local salient features, thereby bringing similar local features closer in the feature space. Additionally, ample contrastive pairs are generated by expanding contrastive pairs based on salient and non-salient features with global features, which guide the network to learn the semantic representations of the entire skeleton. Therefore, with the attention mask mechanism, SkeAttnCLR learns local features under different data augmentation views. The experiment results demonstrate that the inclusion of local feature similarity significantly enhances skeleton-based action representation. Our proposed SkeAttnCLR outperforms state-of-the-art methods on NTURGB+D, NTU120-RGB+D, and PKU-MMD datasets. The code and settings are available at this repository: https://github.com/GitHubOfHyl97/SkeAttnCLR.
List of keywords
Computer Vision -> CV: Action and behavior recognition
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Representation learning
Machine Learning -> ML: Self-supervised Learning
2841
ODEE: A One-Stage Object Detection Framework for Overlapping and Nested Event Extraction
Jinzhong Ning, Zhihao Yang, Zhizheng Wang, Yuanyuan Sun, Hongfei Lin
[+] More
[-] Less
The task of extracting overlapping and nested events has received significant attention in recent times, as prior research has primarily focused on extracting flat events, overlooking the intricacies of overlapping and nested occurrences. In this work, we present a new approach to Event Extraction (EE) by reformulating it as an object detection task on a table of token pairs. Our proposed one-stage event extractor, called ODEE, can handle overlapping and nested events. The model is designed with a vertex-based tagging scheme and two auxiliary tasks of predicting the spans and types of event trigger words and argument entities, leveraging the full span information of event elements. Furthermore, in the training stage, we introduce a negative sampling method for table cells to address the imbalance problem of positive and negative table cell tags, meanwhile improving computational efficiency. Empirical evaluations demonstrate that ODEE achieves the state-of-the-art performance on three benchmarks for overlapping and nested EE (i.e., FewFC, Genia11, and Genia13). Furthermore, ODEE outperforms current state-of-the-art methods in terms of both number of parameters and inference speed, indicating its high computational efficiency. To facilitate future research in this area, the codes are publicly available at https://github.com/NingJinzhong/ODEE.
List of keywords
Natural Language Processing -> NLP: Information extraction
Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Named entities
2847
Approximate Inference in Logical Credal Networks
Radu Marinescu, Haifeng Qian, Alexander Gray, Debarun Bhattacharjya, Francisco Barahona, Tian Gao, Ryan Riegel
[+] More
[-] Less
The Logical Credal Network or LCN is a recent probabilistic logic designed for effective aggregation and reasoning over multiple sources of imprecise knowledge. An LCN specifies a set of probability distributions over all interpretations of a set of logical formulas for which marginal and conditional probability bounds on their truth values are known. Inference in LCNs involves the exact solution of a non-convex non-linear program defined over an exponentially large number of non-negative real valued variables and, therefore, is limited to relatively small problems. In this paper, we present ARIEL — a novel iterative message-passing scheme for approximate inference in LCNs. Inspired by classical belief propagation for graphical models, our method propagates messages that involve solving considerably smaller local non-linear programs. Experiments on several classes of LCNs demonstrate clearly that ARIEL yields high quality solutions compared with exact inference and scales to much larger problems than previously considered.
List of keywords
Uncertainty in AI -> UAI: Graphical models
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Uncertainty in AI -> UAI: Inference
2849
Spatial-Temporal Self-Attention for Asynchronous Spiking Neural Networks
Yuchen Wang, Kexin Shi, Chengzhuo Lu, Yuguo Liu, Malu Zhang, Hong Qu
[+] More
[-] Less
The brain-inspired spiking neural networks (SNNs) are receiving increasing attention due to their asynchronous event-driven characteristics and low power consumption. As attention mechanisms recently become an indispensable part of sequence dependence modeling, the combination of SNNs and attention mechanisms holds great potential for energy-efficient and high-performance computing paradigms. However, the existing works cannot benefit from both temporal-wise attention and the asynchronous characteristic of SNNs. To fully leverage the advantages of both SNNs and attention mechanisms, we propose an SNNs-based spatial-temporal self-attention (STSA) mechanism, which calculates the feature dependence across the time and space domains without destroying the asynchronous transmission properties of SNNs. To further improve the performance, we also propose a spatial-temporal relative position bias (STRPB) for STSA to consider the spatiotemporal position of spikes. Based on the STSA and STRPB, we construct a spatial-temporal spiking Transformer framework, named STS-Transformer, which is powerful and enables SNNs to work in an asynchronous event-driven manner. Extensive experiments are conducted on popular neuromorphic datasets and speech datasets, including DVS128 Gesture, CIFAR10-DVS, and Google Speech Commands, and our experimental results can outperform other state-of-the-art models.
List of keywords
Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Applications
Humans and AI -> HAI: Cognitive systems
2869
Teacher Assistant-Based Knowledge Distillation Extracting Multi-level Features on Single Channel Sleep EEG
Heng Liang, Yucheng Liu, Haichao Wang, Ziyu Jia
[+] More
[-] Less
Sleep stage classification is of great significance to the diagnosis of sleep disorders. However, existing sleep stage classification models based on deep learning are usually relatively large in size (wider and deeper), which makes them hard to be deployed on wearable devices. Therefore, it is a challenge to lighten the existing sleep stage classification models. In this paper, we propose a novel general knowledge distillation framework for sleep stage classification tasks called SleepKD. Our SleepKD, composed of the multi-level module, teacher assistant module, and other knowledge distillation modules, aims to lighten large-scale sleep stage classification models. Specifically, the multi-level module is able to transfer the multi-level knowledge extracted from sleep signals by the teacher model (large-scale model) to the student model (lightweight model). Moreover, the teacher assistant module bridges the large gap between the teacher and student network, and further improves the distillation. We evaluate our method on two public sleep datasets (Sleep-EDF and ISRUC-III). Compared to the baseline methods, the results show that our knowledge distillation framework achieves state-of-the-art performance. SleepKD can significantly lighten the sleep model while maintaining its classification performance.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Health and medicine
Humans and AI -> HAI: Applications
2873
CROP: Towards Distributional-Shift Robust Reinforcement Learning using Compact Reshaped Observation Processing
Philipp Altmann, Leonard Feuchtinger, Fabian Ritz, Jonas Nüßlein, Thomy Phan, Claudia Linnhoff-Popien
[+] More
[-] Less
The safe application of reinforcement learning (RL) requires generalization from limited training data to unseen scenarios. Yet, fulfilling tasks under changing circumstances is a key challenge in RL. Current state-of-the-art approaches for generalization apply data augmentation techniques to increase the diversity of training data. Even though this prevents overfitting to the training environment(s), it hinders policy optimization. Crafting a suitable observation, only containing crucial information, has been shown to be a challenging task itself. To improve data efficiency and generalization capabilities, we propose Compact Reshaped Observation Processing (CROP) to reduce the state information used for policy optimization. By providing only relevant information, overfitting to a specific training layout is precluded and generalization to unseen environments is improved. We formulate three CROPs that can be applied to fully observable observation- and action-spaces and provide methodical foundation. We empirically show the improvements of CROP in a distributionally shifted safety gridworld. We furthermore provide benchmark comparisons to full observability and data-augmentation in two different-sized procedurally generated mazes.
List of keywords
Machine Learning -> ML: Deep reinforcement learning
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Machine Learning -> ML: Robustness
2877
SemiGNN-PPI: Self-Ensembling Multi-Graph Neural Network for Efficient and Generalizable Protein–Protein Interaction Prediction
Ziyuan Zhao, Peisheng Qian, Xulei Yang, Zeng Zeng, Cuntai Guan, Wai Leong Tam, Xiaoli Li
[+] More
[-] Less
Protein-protein interactions (PPIs) are crucial in various biological processes and their study has significant implications for drug development and disease diagnosis. Existing deep learning methods suffer from significant performance degradation under complex real-world scenarios due to various factors, e.g., label scarcity and domain shift. In this paper, we propose a self-ensembling multi-graph neural network (SemiGNN-PPI) that can effectively predict PPIs while being both efficient and generalizable. In SemiGNN-PPI, we not only model the correlations between proteins but explore the label dependencies by constructing and processing multiple graphs from the perspectives of both features and labels in the graph learning process. We further marry GNN with Mean Teacher to effectively leverage unlabeled graph-structured PPI data for self-ensemble graph learning. We also design multiple graph consistency constraints to align the student and teacher graphs in the feature embedding space, enabling the student model to better learn from the teacher model by incorporating more relationships. Extensive experiments on PPI datasets of different scales with different evaluation settings demonstrate that SemiGNN-PPI outperforms state-of-the-art PPI prediction methods, particularly in challenging scenarios such as training with limited annotations and testing on unseen data.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Bioinformatics
Multidisciplinary Topics and Applications -> MDA: Health and medicine
2883
Locate, Refine and Restore: A Progressive Enhancement Network for Camouflaged Object Detection
Xiaofei Li, JIAXIN YANG, Shuohao LI, Jun Lei, Jun Zhang, Dong Chen
[+] More
[-] Less
Camouflaged Object Detection (COD) intends to accurately segment objects that are visually integrated into their surroundings. Most existing methods mainly tackle this issue by a single-step framework, which tends to degrade performance in the face of small objects, low-contrast objects, and objects with large variation in appearance. In this paper, we propose a novel Progressive Enhancement Network (PENet) for COD by imitating the human visual detection system, which follows a three-step detection fashion: locate objects, refine textures, and restore boundaries. Specifically, our PENet contains three key modules, i.e., the object location module (OLM), the group attention module (GAM), and the context feature restoration module (CFRM). The OLM is designed to position the object globally, the GAM is developed to refine both high-level semantic and low-level texture feature representation, and the CFAM is leveraged to effectively aggregate multi-level features for progressively restoring the clear boundaries. Extensive experiments results demonstrate that our PENet significantly outperforms the 31 state-of-the-art methods on four widely used benchmark datasets. The code will be open source soon.
List of keywords
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Recognition (object detection, categorization)
2905
Computing Abductive Explanations for Boosted Regression Trees
Gilles Audemard, Steve Bellart, Jean-Marie Lagniez, Pierre Marquis
[+] More
[-] Less
We present two algorithms for generating (resp. evaluating) abductive explanations for boosted regression trees. Given an instance x and an interval I containing its value F(x) for the boosted regression tree F at hand, the generation algorithm returns a (most general) term t over the Boolean conditions in F such that every instance x’ satisfying t is such that F(x’) belongs to I. The evaluation algorithm tackles the corresponding inverse problem: given F, x, and a term t over the Boolean conditions in F such that t covers x, find the least interval I_t such that for every instance x’ covered by t we have F(x’) in I_t. Experiments on various datasets show that the two algorithms are practical enough to be used for generating (resp. evaluating) abductive explanations for boosted regression trees of significant size.
List of keywords
Machine Learning -> ML: Explainable/Interpretable machine learning
Constraint Satisfaction and Optimization -> CSO: Constraint programming
Machine Learning -> ML: Regression
2907
Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits
Vishakha Patil, Vineet Nair, Ganesh Ghalme, Arindam Khan
[+] More
[-] Less
We study the Improving Multi-Armed Bandit problem, where the reward obtained from an arm increases with the number of pulls it receives. This model provides an elegant abstraction for many real-world problems in domains such as education and employment, where decisions about the distribution of opportunities can affect the future capabilities of communities and the disparity between them. A decision-maker in such settings must consider the impact of her decisions on future rewards in addition to the standard objective of maximizing her cumulative reward at any time. We study the tension between two seemingly conflicting objectives in the horizon-unaware setting: a) maximizing the cumulative reward at any time and b) ensuring that arms with better long-term rewards get sufficient pulls even if they initially have low rewards. We show that, surprisingly, the two objectives are aligned with each other. Our main contribution is an anytime algorithm for the IMAB problem that achieves the best possible cumulative reward while ensuring that the arms reach their true potential given sufficient time. Our algorithm mitigates the initial disparity due to lack of opportunity and continues pulling an arm until it stops improving. We prove the optimality of our algorithm by showing that a) any algorithm for the IMAB problem, no matter how utilitarian, must suffer $\Omega(T)$ policy regret and $\Omega(k)$ competitive ratio with respect to the optimal offline policy, and b) the competitive ratio of our algorithm is $O(k)$.
List of keywords
Machine Learning -> ML: Online learning
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Uncertainty in AI -> UAI: Sequential decision making
2911
Meta-Tsallis-Entropy Minimization: a new Self-Training approach for domain adaptation on text classification
Menglong Lu, Zhen Huang, ZHILIANG TIAN, Yunxiang Zhao, xuanyu fei, Dongsheng Li
[+] More
[-] Less
Text classification is a fundamental task for natural language processing, and adapting text classification models across domains has broad applications. Self-training generates pseudo-examples from the model’s predictions and iteratively trains on the pseudo-examples, i.e., minimizes the loss on the source domain and the Gibbs entropy on the target domain. However, Gibbs entropy is sensitive to prediction errors, and thus, self-training tends to fail when the domain shift is large. In this paper, we propose Meta-Tsallis Entropy minimization (MTEM). MTEM uses an instance adaptive Tsallis entropy to replace the Gibbs entropy and a meta-learning algorithm to optimize the instance adaptive Tsallis entropy on the target domain. To reduce the computation cost of MTEM, we propose an approximation technique to approximate the second-order derivation involved in the meta-learning. To efficiently generate pseudo labels, we propose an annealing sampling mechanism for exploring the model’s prediction probability. Theoretically, we prove the convergence of the meta-learning algorithm in MTEM and analyze the effectiveness of MTEM in achieving domain adaptation. Experimentally, MTEM improves the adaptation performance of BERT with an average of 4 percent.
List of keywords
Natural Language Processing -> NLP: Text classification
Natural Language Processing -> NLP: Applications
2912
The Parameterized Complexity of Finding Concise Local Explanations
Sebastian Ordyniak, Giacomo Paesani, Stefan Szeider
[+] More
[-] Less
We consider the computational problem of finding a smallest local explanation (anchor) for classifying a given feature vector (example) by a black-box model. After showing that the problem is NP-hard in general, we study various natural restrictions of the problem in terms of problem parameters to see whether these restrictions make the problem fixed-parameter tractable or not. We draw a detailed and systematic complexity landscape for combinations of parameters, including the size of the anchor, the size of the anchor’s coverage, and parameters that capture structural aspects of the problem instance, including rank-width and maximum difference.
List of keywords
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Machine Learning -> ML: Explainable/Interpretable machine learning
2927
ReLiNet: Stable and Explainable Multistep Prediction with Recurrent Linear Parameter Varying Networks
Alexandra Baier, Decky Aspandi, Steffen Staab
[+] More
[-] Less
Multistep prediction models are essential for the simulation and model-predictive control of dynamical systems. Verifying the safety of such models is a multi-faceted problem requiring both system-theoretic guarantees as well as establishing trust with human users. In this work, we propose a novel approach, ReLiNet (Recurrent Linear Parameter Varying Network), to ensure safety for multistep prediction of dynamical systems. Our approach simplifies a recurrent neural network to a switched linear system that is constrained to guarantee exponential stability, which acts as a surrogate for safety from a system-theoretic perspective. Furthermore, ReLiNet’s computation can be reduced to a single linear model for each time step, resulting in predictions that are explainable by definition, thereby establishing trust from a human-centric perspective. Our quantitative experiments show that ReLiNet achieves prediction accuracy comparable to that of state-of-the-art recurrent neural networks, while achieving more faithful and robust explanations compared to the model-agnostic explanation method of LIME.
List of keywords
Machine Learning -> ML: Recurrent networks
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
2929
Complex Contagion Influence Maximization: A Reinforcement Learning Approach
Haipeng Chen, Bryan Wilder, Wei Qiu, Bo An, Eric Rice, Milind Tambe
[+] More
[-] Less
In influence maximization (IM), the goal is to find a set of seed nodes in a social network that maximizes the influence spread. While most IM problems focus on classical influence cascades (e.g., Independent Cascade and Linear Threshold) which assume individual influence cascade probability is independent of the number of neighbors, recent studies by sociologists show that many influence cascades follow a pattern called complex contagion (CC), where influence cascade probability is much higher when more neighbors are influenced. Nonetheless, there are very limited studies for complex contagion influence maximization (CCIM) problems. This is partly because CC is non-submodular, the solution of which has been an open challenge. In this study, we propose the first reinforcement learning (RL) approach to CCIM. We find that a key obstacle in applying existing RL approaches to CCIM is the reward sparseness issue, which comes from two distinct sources. We then design a new RL algorithm that uses the CCIM problem structure to address the issue. Empirical results show that our approach achieves the state-of-the-art performance on 9 real-world networks.
List of keywords
Search -> S: Combinatorial search and optimisation
Machine Learning -> ML: Reinforcement learning
Multidisciplinary Topics and Applications -> MDA: Web and social networks
2969
Shaken, and Stirred: Long-Range Dependencies Enable Robust Outlier Detection with PixelCNN++
Barath Mohan Umapathi, Kushal Chauhan, Pradeep Shenoy, Devarajan Sridharan
[+] More
[-] Less
Reliable outlier detection is critical for real-world deployment of deep learning models. Although extensively studied, likelihoods produced by deep generative models have been largely dismissed as being impractical for outlier detection. First, deep generative model likelihoods are readily biased by low-level input statistics. Second, many recent solutions for correcting these biases are computationally expensive or do not generalize well to complex, natural datasets. Here, we explore outlier detection with a state-of-the-art deep autoregressive model: PixelCNN++. We show that biases in PixelCNN++ likelihoods arise primarily from predictions based on local dependencies. We propose two families of bijective transformations — “shaking” and “stirring” — which ameliorate low-level biases and isolate the contribution of long-range dependencies to PixelCNN++ likelihoods. These transformations are inexpensive and readily computed at evaluation time. We test our approaches extensively with five grayscale and six natural image datasets and show that they achieve or exceed state-of-the-art outlier detection, particularly on datasets with complex, natural images. We also show that our solutions work well with other types of generative models (generative flows and variational autoencoders) and that their efficacy is governed by each model’s reliance on local dependencies. In sum, lightweight remedies suffice to achieve robust outlier detection on images with deep generative models.
List of keywords
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Robustness
2989
ICDA: Illumination-Coupled Domain Adaptation Framework for Unsupervised Nighttime Semantic Segmentation
Chenghao Dong, Xuejing Kang, Anlong Ming
[+] More
[-] Less
The performance of nighttime semantic segmentation has been significantly improved thanks to recent unsupervised methods. However, these methods still suffer from complex domain gaps, i.e., the challenging illumination gap and the inherent dataset gap. In this paper, we propose the illumination-coupled domain adaptation framework(ICDA) to effectively avoid the illumination gap and mitigate the dataset gap by coupling daytime and nighttime images as a whole with semantic relevance. Specifically, we first design a new composite enhancement method(CEM) that considers not only illumination but also spatial consistency to construct the source and target domain pairs, which provides the basic adaptation unit for our ICDA. Next, to avoid the illumination gap, we devise the Deformable Attention Relevance(DAR) module to capture the semantic relevance inside each domain pair, which can couple the daytime and nighttime images at the feature level and adaptively guide the predictions of nighttime images. Besides, to mitigate the dataset gap and acquire domain-invariant semantic relevance, we propose the Prototype-based Class Alignment(PCA) module, which improves the usage of category information and performs fine-grained alignment. Extensive experiments show that our method reduces the complex domain gaps and achieves state-of-the-art performance for nighttime semantic segmentation.
List of keywords
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
2998
Efficient NLP Model Finetuning via Multistage Data Filtering
Xu Ouyang, Shahina Mohd Azam Ansari, Felix Lin, Yangfeng Ji
[+] More
[-] Less
As model finetuning is central to the modern NLP, we set to maximize its efficiency. Motivated by redundancy in training examples and the sheer sizes of pretrained models, we exploit a key opportunity: training only on important data. To this end, we set to filter training examples in a streaming fashion, in tandem with training the target model. Our key techniques are two: (1) automatically determine a training loss threshold for skipping backward training passes; (2) run a meta predictor for further skipping forward training passes. We integrate the above techniques in a holistic, three-stage training pro- cess. On a diverse set of benchmarks, our method reduces the required training examples by up to 5.3× and training time by up to 6.8×, while only seeing minor accuracy degradation. Our method is effective even for training one epoch, where each training example is encountered only once. It is simple to implement and is compatible with the existing finetuning techniques.
List of keywords
Natural Language Processing -> NLP: Text classification
Machine Learning -> ML: Automated machine learning
3002
Scalable Verification of Strategy Logic by Three-Valued Abstraction
Francesco Belardinelli, Angelo Ferrando, Wojciech Jamroga, Vadim Malvone, Aniello Murano
[+] More
[-] Less
The model checking problem for multi-agent systems against Strategy Logic specifications is known to be non-elementary. On this logic several fragments have been defined to tackle this issue but at the expense of expressiveness. In this paper, we propose a three-valued semantics for Strategy Logic upon which we define an abstraction method. We show that the latter semantics is an approximation of the classic two-valued one for Strategy Logic. Furthermore, we extend MCMAS, an open-source model checker for multi-agent specifications, to incorporate our abstraction method and present some promising experimental results.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
3034
Minimizing Reachability Times on Temporal Graphs via Shifting Labels
Argyrios Deligkas, Eduard Eiben, George Skretas
[+] More
[-] Less
We study how we can accelerate the spreading of information in temporal graphs via shifting operations; a problem that captures real-world applications varying from information flows to distribution schedules. In a temporal graph there is a set of fixed vertices and the available connections between them change over time in a predefined manner. We observe that, in some cases, shifting some connections, i.e., advancing or delaying them, can decrease the time required to reach from some vertex (source) to another vertex. We study how we can minimize the maximum time a set of sources needs to reach every vertex, when we are allowed to shift some of the connections. If we restrict the allowed number of changes, we prove that, already for a single source, the problem is NP-hard, and W[2]-hard when parameterized by the number of changes. Then we focus on unconstrained number of changes. We derive a polynomial-time algorithm when there is one source. When there are two sources, we show that the problem becomes NP-hard; on the other hand, we design an FPT algorithm parameterized by the treewidth of the graph plus the lifetime of the optimal solution, that works for any number of sources. Finally, we provide polynomial-time algorithms for several graph classes.
List of keywords
Planning and Scheduling -> PS: Theoretical foundations of planning
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: Scheduling
3037
RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models
Xingchen Zhou, Ying He, F Richard Yu, Jianqiang Li, You Li
[+] More
[-] Less
The emergence of Neural Radiance Fields (NeRF) has promoted the development of synthesized high-fidelity views of the intricate real world. However, it is still a very demanding task to repaint the content in NeRF. In this paper, we propose a novel framework that can take RGB images as input and alter the 3D content in neural scenes. Our work leverages existing diffusion models to guide changes in the designated 3D content. Specifically, we semantically select the object we want to modify first, and a pre-trained diffusion model will guide the NeRF model to generate new 3D objects, which can improve the editability, diversity, and application range of NeRF. Experiment results show that our algorithm is effective for editing 3D objects in NeRF under different text prompts, including editing appearance, shape, etc. We validate our method on real-world datasets and synthetic-world datasets for these editing tasks.
List of keywords
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Applications
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
3040
Dual Video Summarization: From Frames to Captions
Zhenzhen Hu, zhenshan wang, Zijie Song, Richang Hong
[+] More
[-] Less
Video summarization and video captioning both condense the video content from the perspective of visual and text modes, i.e. the keyframe selection and language description generation. Existing video-and-language learning models commonly sample multiple frames for training instead of observing all. These sampled deputies greatly improve computational efficiency, but do they represent the original video content enough with no more redundancy? In this work, we propose a dual video summarization framework and verify it in the context of video captioning. Given the video frames, we firstly extract the visual representation based on the ViT model fine-tuned on the video-text domain. Then we summarize the keyframes according to the frame-lever score. To compress the number of keyframes as much as possible while ensuring the quality of captioning, we learn a cross-modal video summarizer to select the most semantically consistent frames according to the pseudo score label. Top $K$ frames ( $K$ is no more than $3\%$ of the entire video.) are chosen to form the video representation. Moreover, to evaluate the static appearance and temporal information of video, we design the ranking scheme of video representation from two aspects: feature-oriented and sequence-oriented. Finally, we generate the descriptions with a lightweight LSTM decoder. The experiment results on the MSR-VTT and MSVD dataset reveal that, for the generative task as video captioning, a small number of keyframes can convey the same semantic information to perform well on captioning, or even better than the original sampling.
List of keywords
Computer Vision -> CV: Vision and language 
Computer Vision -> CV: Video analysis and understanding   
3049
Neuro-Symbolic Class Expression Learning
Caglar Demir, Axel-Cyrille Ngonga Ngomo
[+] More
[-] Less
Deep learning based models have been effectively applied to tackle various problems in many disciplines. Yet, their predictions are often at most post-hoc and locally explainable. In contrast, predicted class expressions in description logics are ante-hoc and globally explainable. Although state-of-the-art symbolic models have been successfully applied to learn class expressions, their large-scale applications have been hindered by their impractical runtimes. Arguably, the reliance on myopic heuristic functions contributes to this limitation. We propose a novel neuro-symbolic class expression learning model, Drill, to mitigate this limitation. By learning non-myopic heuristic functions with deep Q-learning, Drill efficiently steers the standard search procedure in a quasi-ordered search space towards goal states. Our extensive experiments on 4 benchmark datasets and 390 learning problems suggest that Drill converges to goal states at least 2.7 times faster than state-of-the-art models on all learning problems. The results of our statistical significance test confirms that~ Drill converges to goal states significantly faster (p-value <1\%) than state-of-the-art models on all benchmark datasets. We provide an open-source implementation of Drill, including pre-trained models, training and evaluation scripts.
List of keywords
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Deep reinforcement learning
3072
Learning Preference Models with Sparse Interactions of Criteria
margot herin, Patrice Perny, Nataliya Sokolovska
[+] More
[-] Less
Multicriteria decision making requires defining the result of conflicting and possibly interacting criteria. Allowing criteria interactions in a decision model increases the complexity of the preference learning task due to the combinatorial nature of the possible interactions. In this paper, we propose an approach to learn a decision model in which the interaction pattern is revealed from preference data and kept as simple as possible. We consider weighted aggregation functions like multilinear utilities or Choquet integrals, admitting representations including non-linear terms measuring the joint benefit or penalty attached to some combinations of criteria. The weighting coefficients known as Möbius masses model positive or negative synergies among criteria. We propose an approach to learn the Möbius masses, based on iterative reweighted least square for sparse recovery, and dualization to improve scalability. This approach is applied to learn sparse representations of the multilinear utility model and conjunctive/disjunctive forms of the discrete Choquet integral from preferences examples, in aggregation problems possibly involving more than 20 criteria.
List of keywords
Machine Learning -> ML: Learning preferences or rankings
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Uncertainty in AI -> UAI: Decision and utility theory
3073
Simplification and Improvement of MMS Approximation
Hannaneh Akrami, Jugal Garg, Eklavya Sharma, Setareh Taki
[+] More
[-] Less
We consider the problem of fairly allocating a set of indivisible goods among n agents with additive valuations, using the popular fairness notion of maximin share (MMS). Since MMS allocations do not always exist, a series of works provided existence and algorithms for approximate MMS allocations. The current best approximation factor, for which the existence is known, is (3/4 + 1/12n) [Garg and Taki, 2021]. Most of these results are based on complicated analyses, especially those providing better than 2/3 factor. Moreover, since no tight example is known of the Garg-Taki algorithm, it is unclear if this is the best factor of this approach. In this paper, we significantly simplify the analysis of this algorithm and also improve the existence guarantee to a factor of (3/4 + min(1/36, 3/(16n-4))). For small n, this provides a noticeable improvement. Furthermore, we present a tight example of this algorithm, showing that this may be the best factor one can hope for with the current techniques.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Fair division
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Computational social choice
3078
Negative Flux Aggregation to Estimate Feature Attributions
Xin Li, Deng Pan, CHNEGYIN LI, Yao Qiang, Dongxiao Zhu
[+] More
[-] Less
There are increasing demands for understanding deep neural networks’ (DNNs) behavior spurred by growing security and/or transparency concerns. Due to multi-layer nonlinearity of the deep neural network architectures, explaining DNN predictions still remains as an open problem, preventing us from gaining a deeper understanding of the mechanisms. To enhance the explainability of DNNs, we estimate the input feature’s attributions to the prediction task using divergence and flux. Inspired by the divergence theorem in vector analysis, we develop a novel Negative Flux Aggregation (NeFLAG) formulation and an efficient approximation algorithm to estimate attribution map. Unlike the previous techniques, ours doesn’t rely on fitting a surrogate model nor need any path integration of gradients. Both qualitative and quantitative experiments demonstrate a superior performance of NeFLAG in generating more faithful attribution maps than the competing methods.
List of keywords
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
3093
Spotlight News Driven Quantitative Trading Based on Trajectory Optimization
Mengyuan Yang, Qianqiao Liang, Xiaolin Zheng, Mengying Zhu, Menghan Wang
[+] More
[-] Less
News-driven quantitative trading (NQT) has been popularly studied in recent years. Most existing NQT methods are performed in a two-step paradigm, i.e., first analyzing markets by a financial prediction task and then making trading decisions, which is doomed to failure due to the nearly futile financial prediction task. To bypass the financial prediction task, in this paper, we focus on reinforcement learning (RL) based NQT paradigm, which leverages news to make profitable trading decisions directly. In this paper, we propose a novel NQT framework SpotlightTrader based on decision trajectory optimization, which can effectively stitch together a continuous and flexible sequence of trading decisions to maximize profits. In addition, we enhance this framework by constructing a spotlight-driven state trajectory that obeys a stochastic process with irregular abrupt jumps caused by spotlight news. Furthermore, in order to adapt to non-stationary financial markets, we propose an effective training pipeline for this framework, which blends offline pretraining with online finetuning to balance exploration and exploitation effectively during online tradings. Extensive experiments on three real-world datasets demonstrate our proposed model’s superiority over the state-of-the-art NQT methods.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Finance
Machine Learning -> ML: Deep reinforcement learning
3095
Learning Small Decision Trees with Large Domain
Eduard Eiben, Sebastian Ordyniak, Giacomo Paesani, Stefan Szeider
[+] More
[-] Less
One favors decision trees (DTs) of the smallest size or depth to facilitate explainability and interpretability. However, learning such an optimal DT from data is well-known to be NP-hard. To overcome this complexity barrier, Ordyniak and Szeider (AAAI 21) initiated the study of optimal DT learning under the parameterized complexity perspective. They showed that solution size (i.e., number of nodes or depth of the DT) is insufficient to obtain fixed-parameter tractability (FPT). Therefore, they proposed an FPT algorithm that utilizes two auxiliary parameters: the maximum difference (as a structural property of the data set) and maximum domain size. They left it as an open question of whether bounding the maximum domain size is necessary. The main result of this paper answers this question. We present FPT algorithms for learning a smallest or lowest-depth DT from data, with the only parameters solution size and maximum difference. Thus, our algorithm is significantly more potent than the one by Szeider and Ordyniak as it can handle problem inputs with features that range over unbounded domains. We also close several gaps concerning the quality of approximation one obtains by only considering DTs based on minimum support sets.
List of keywords
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Machine Learning -> ML: Explainable/Interpretable machine learning
3109
Safe Multi-agent Learning via Trapping Regions
Aleksander Czechowski, Frans Oliehoek
[+] More
[-] Less
One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to most single-agent environments, and sets a prohibitive barrier for deployment in practical applications, as it induces uncertainty in long term behavior of the system. In this work, we propose to apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. Upon verification of the direction of learning dynamics, the resulting trajectories are guaranteed not to escape such sets, during the learning process. As a result, it is ensured, that despite the uncertainty over convergence of the applied algorithms, learning will never form hazardous joint strategy combinations. We introduce a binary partitioning algorithm for verification of trapping regions in systems with known learning dynamics, and a heuristic sampling algorithm for scenarios where learning dynamics are not known. In addition, via a fixed point argument, we show the existence of a learning equilibrium within a trapping region. We demonstrate the applications to a regularized version of Dirac Generative Adversarial Network, a four-intersection traffic control scenario run in a state of the art open-source microscopic traffic simulator SUMO, and a mathematical model of economic competition.
List of keywords
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
3115
Fairly Allocating Goods and (Terrible) Chores
Hadi Hosseini, Aghaheybat Mammadov, Tomasz Wąs
[+] More
[-] Less
We study the fair allocation of mixture of indivisible goods and chores under lexicographic preferences—a subdomain of additive preferences. A prominent fairness notion for allocating indivisible items is envy-freeness up to any item (EFX). Yet, its existence and computation has remained a notable open problem. By identifying a class of instances with “terrible chores”, we show that determining the existence of an EFX allocation is NP-complete. This result immediately implies the intractability of EFX under additive preferences. Nonetheless, we propose a natural subclass of lexicographic preferences for which an EFX and Pareto optimal (PO) allocation is guaranteed to exist and can be computed efficiently for any mixed instance. Focusing on two weaker fairness notions, we investigate finding EF1 and Pareto optimal allocations for special instances with terrible chores, and show that MMS and PO allocations can be computed efficiently for any mixed instance with lexicographic preferences.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Fair division
Game Theory and Economic Paradigms -> GTEP: Computational social choice
3117
Finding an ϵ-Close Minimal Variation of Parameters in Bayesian Networks
Bahare Salmani, Joost-Pieter Katoen
[+] More
[-] Less
This paper addresses the ε-close parameter tuning problem for Bayesian networks (BNs): find a minimal ε-close amendment of probability entries in a given set of (rows in) conditional probability tables that make a given quantitative constraint on the BN valid. Based on the state-of-the-art “region verification” techniques for parametric Markov chains, we propose an algorithm for this problem whose capabilities go beyond any existing BN tools. Our experiments show that ε-close tuning of large BN benchmarks with up to eight (explicitly varied) parameters is feasible. In particular, by allowing (i) varied parameters in multiple CPTs and (ii) inter-CPT parameter dependencies, we treat subclasses of parametric BNs that have received less attention so far.
List of keywords
Uncertainty in AI -> UAI: Bayesian networks
3127
Building Concise Logical Patterns by Constraining Tsetlin Machine Clause Size
Darshana Abeyrathna, Ahmed Abouzeid, Bimal Bhattarai, Charul Giri, Sondre Glimsdal, Ole-Christoffer Granmo, Lei Jiao, Rupsa Saha, Jivitesh Sharma, Svein Tunheim, Xuan Zhang
[+] More
[-] Less
Tsetlin Machine (TM) is a logic-based machine learning approach with the crucial advantages of being transparent and hardware-friendly. While TMs match or surpass deep learning accuracy for an increasing number of applications, large clause pools tend to produce clauses with many literals (long clauses). As such, they become less interpretable. Further, longer clauses increase the switching activity of the clause logic in hardware, consuming more power. This paper introduces a novel variant of TM learning — Clause Size Constrained TMs (CSC-TMs) — where one can set a soft constraint on the clause size. As soon as a clause includes more literals than the constraint allows, it starts expelling literals. Accordingly, oversized clauses only appear transiently. To evaluate CSC-TM, we conduct classification, clustering, and regression experiments on tabular data, natural language text, images, and board games. Our results show that CSC-TM maintains accuracy with up to 80 times fewer literals. Indeed, the accuracy increases with shorter clauses for TREC and BBC Sports. After the accuracy peaks, it drops gracefully as the clause size approaches one literal. We finally analyze CSC-TM power consumption and derive new convergence properties.
List of keywords
Machine Learning -> ML: Explainable/Interpretable machine learning
Machine Learning -> ML: Other
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
3138
Sketch Recognition via Part-based Hierarchical Analogical Learning
Kezhen Chen, Ken Forbus, Balaji Vasan Srinivasan, Niyati Chhaya, Madeline Usher
[+] More
[-] Less
Sketch recognition has been studied for decades, but it is far from solved. Drawing styles are highly variable across people and adapting to idiosyncratic visual expressions requires data-efficient learning. Explainability also matters, so that users can see why a system got confused about something. This paper introduces a novel part-based approach for sketch recognition, based on hierarchical analogical learning, a new method to apply analogical learning to qualitative representations. Given a sketched object, our system automatically segments it into parts and constructs multi-level qualitative representations of them. Our approach performs analogical generalization at multiple levels of part descriptions and uses coarse-grained results to guide interpretation at finer levels. Experiments on the Berlin TU dataset and the Coloring Book Objects dataset show that the system can learn explainable models in a data-efficient manner.
List of keywords
Humans and AI -> HAI: Cognitive modeling
Knowledge Representation and Reasoning -> KRR: Case-based reasoning
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
3139
The Computational Complexity of Single-Player Imperfect-Recall Games
Emanuel Tewolde, Caspar Oesterheld, Vincent Conitzer, Paul Goldberg
[+] More
[-] Less
We study single-player extensive-form games with imperfect recall, such as the Sleeping Beauty problem or the Absentminded Driver game. For such games, two natural equilibrium concepts have been proposed as a solution concept alternative to ex-ante optimality. One equilibrium concept uses generalized double halving (GDH) as a belief system and evidential decision theory (EDT), and another one uses generalized thirding (GT) as a belief system and causal decision theory (CDT). Our findings associate those three solution concepts of a game to solution concepts of a polynomial maximization problem – namely – global optima, optimal points with respect to subsets of variables and Karush–Kuhn–Tucker (KKT) points. Based on these correspondences, we are able to settle various complexity-theoretic questions on the computation of ex-ante optimal or equilibrium strategies.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Game Theory and Economic Paradigms -> GTEP: Other
Uncertainty in AI -> UAI: Decision and utility theory
3140
Can You Improve My Code? Optimizing Programs with Local Search
Fatemeh Abdollahi, Saqib Ameen, Matthew E. Taylor, Levi Lelis
[+] More
[-] Less
This paper introduces a system that performs local search for improving an existing program with respect to a measurable objective. Program Optimization with Locally Improving Search (POLIS) exploits the structure of a program, defined by its lines, to partition a synthesis task into several smaller tasks that can be solved with existing brute-force synthesis algorithms. POLIS improves a single line of the program while keeping the remaining lines fixed, and continues iterating until it is unable to improve the objective value of the program. POLIS was evaluated with a 27-person user study, where participants were instructed and rewarded to write programs that maximized the score of two single-agent games: lunar lander and highway. POLIS was able to substantially improve the participants’ programs with respect to the game scores. These results suggest that POLIS could be used as a helpful programming assistant for programming problems with measurable objectives.
List of keywords
Humans and AI -> HAI: Applications
Humans and AI -> HAI: Human-AI collaboration
3153
Disentanglement of Latent Representations via Causal Interventions
Gaël Gendron, Michael Witbrock, Gillian Dobbie
[+] More
[-] Less
The process of generating data such as images is controlled by independent and unknown factors of variation. The retrieval of these variables has been studied extensively in the disentanglement, causal representation learning, and independent component analysis fields. Recently, approaches merging these domains together have shown great success. Instead of directly representing the factors of variation, the problem of disentanglement can be seen as finding the interventions on one image that yield a change to a single factor. Following this assumption, we introduce a new method for disentanglement inspired by causal dynamics that combines causality theory with vector-quantized variational autoencoders. Our model considers the quantized vectors as causal variables and links them in a causal graph. It performs causal interventions on the graph and generates atomic transitions affecting a unique factor of variation in the image. We also introduce a new task of action retrieval that consists of finding the action responsible for the transition between two images. We test our method on standard synthetic and real-world disentanglement datasets. We show that it can effectively disentangle the factors of variation and perform precise interventions on high-level semantic attributes of an image without affecting its quality, even with imbalanced data distributions.
List of keywords
Knowledge Representation and Reasoning -> KRR: Causality
Computer Vision -> CV: Representation learning
Machine Learning -> ML: Autoencoders
3155
Exploring Structural Similarity in Fitness Landscapes via Graph Data Mining: A Case Study on Number Partitioning Problems
Mingyu Huang, Ke Li
[+] More
[-] Less
One of the most common problem-solving heuristics is by analogy. For a given problem, a solver can be viewed as a strategic walk on its fitness landscape. Thus if a solver works for one problem instance, we expect it will also be effective for other instances whose fitness landscapes essentially share structural similarities with each other. However, due to the black-box nature of combinatorial optimization, it is far from trivial to infer such similarity in real-world scenarios. To bridge this gap, by using local optima network as a proxy of fitness landscapes, this paper proposed to leverage graph data mining techniques to conduct qualitative and quantitative analyses to explore the latent topological structural information embedded in those landscapes. In our experiments, we use the number partitioning problem as the case and our empirical results are inspiring to support the overall assumption of the existence of structural similarity between landscapes within neighboring dimensions. Besides, experiments on simulated annealing demonstrate that the performance of a metaheuristic solver is similar on structurally similar landscapes.
List of keywords
Data Mining -> DM: Exploratory data mining
Data Mining -> DM: Data visualization
Search -> S: Combinatorial search and optimisation
3162
DiffAR: Adaptive Conditional Diffusion Model for Temporal-augmented Human Activity Recognition
Shuokang Huang, Po-Yu Chen, Julie McCann
[+] More
[-] Less
Human activity recognition (HAR) is a fundamental sensing and analysis technique that supports diverse applications, such as smart homes and healthcare. In device-free and non-intrusive HAR, WiFi channel state information (CSI) captures wireless signal variations caused by human interference without the need for video cameras or on-body sensors. However, current CSI-based HAR performance is hampered by incomplete CSI recordings due to fixed window sizes in CSI collection and human/machine errors that incur missing values in CSI. To address these issues, we propose DiffAR, a temporal-augmented HAR approach that improves HAR performance by augmenting CSI. DiffAR devises a novel Adaptive Conditional Diffusion Model (ACDM) to synthesize augmented CSI, which tackles the issue of fixed windows by forecasting and handles missing values with imputation. Compared to existing diffusion models, ACDM improves the synthesis quality by guiding progressive synthesis with step-specific conditions. DiffAR further exploits an ensemble classifier for activity recognition using both raw and augmented CSI. Extensive experiments on four public datasets show that DiffAR achieves the best synthesis quality of augmented CSI and outperforms state-of-the-art CSI-based HAR methods in recognition performance. The source code of DiffAR is available at https://github.com/huangshk/DiffAR.
List of keywords
Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Time series and data streams
Data Mining -> DM: Mining spatial and/or temporal data
Machine Learning -> ML: Applications
3171
On Optimal Strategies for Wordle and General Guessing Games
Michael Cunanan, Michael Thielscher
[+] More
[-] Less
The recent popularity of Wordle has revived interest in guessing games. We develop a general method for finding optimal strategies for guessing games while avoiding an exhaustive search. Our main contribution are several theorems that build towards a general theory to prove optimality of a strategy for a guessing game. This work is developed to apply to any guessing game, but we use Wordle as an example to present concrete results.
List of keywords
Search -> S: Combinatorial search and optimisation
Multidisciplinary Topics and Applications -> MDA: Game playing
Search -> S: Applications
3174
Cognitively Inspired Learning of Incremental Drifting Concepts
Mohammad Rostami, Aram Galstyan
[+] More
[-] Less
Humans continually expand their learned knowledge to new domains and learn new concepts without any interference with past learned experiences. In contrast, machine learning models perform poorly in a continual learning setting, where input data distribution changes over time. Inspired by the nervous system learning mechanisms, we develop a computational model that enables a deep neural network to learn new concepts and expand its learned knowledge to new domains incrementally in a continual learning setting. We rely on the Parallel Distributed Processing theory to encode abstract concepts in an embedding space in terms of a multimodal distribution. This embedding space is modeled by internal data representations in a hidden network layer. We also leverage the Complementary Learning Systems theory to equip the model with a memory mechanism to overcome catastrophic forgetting through implementing pseudo-rehearsal. Our model can generate pseudo-data points for experience replay and accumulate new experiences to past learned experiences without causing cross-task interference.
List of keywords
Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Brain sciences
Humans and AI -> HAI: Cognitive systems
3195
Character As Pixels: A Controllable Prompt Adversarial Attacking Framework for Black-Box Text Guided Image Generation Models
Ziyi Kou, Shichao Pei, Yijun Tian, Xiangliang Zhang
[+] More
[-] Less
In this paper, we study a controllable prompt adversarial attacking problem for text guided image generation (Text2Image) models in the black-box scenario, where the goal is to attack specific visual subjects (e.g., changing a brown dog to white) in a generated image by slightly, if not imperceptibly, perturbing the characters of the driven prompt (e.g., “brown” $\rightarrow$ “bro\k{w}n”). Our study is motivated by the limitations of current Text2Image attacking approaches that still rely on manual trials to create adversarial prompts. To address such limitations, we develop CharGrad, a character-level gradient based attacking framework that replaces specific characters of a prompt with pixel-level similar ones by interactively learning the perturbation direction for the prompt and updating the attacking examiner for the generated image based on a novel proxy perturbation representation for characters. We evaluate CharGrad using the texts from two public image captioning datasets. Results demonstrate that CharGrad outperforms existing text adversarial attacking approaches on attacking various subjects of generated images by black-box Text2Image models in a more effective and efficient way with less perturbation on the characters of the prompts.
List of keywords
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
3200
RZCR: Zero-shot Character Recognition via Radical-based Reasoning
Xiaolei Diao, Daqian Shi, Hao Tang, Qiang Shen, Yanzeng Li, Lei Wu, Hao Xu
[+] More
[-] Less
The long-tail effect is a common issue that limits the performance of deep learning models on real-world datasets. Character image datasets are also affected by such unbalanced data distribution due to differences in character usage frequency. Thus, current character recognition methods are limited when applied in the real world, especially for the categories in the tail that lack training samples, e.g., uncommon characters. In this paper, we propose a zero-shot character recognition framework via radical-based reasoning, called RZCR, to improve the recognition performance of few-sample character categories in the tail. Specifically, we exploit radicals, the graphical units of characters, by decomposing and reconstructing characters according to orthography. RZCR consists of a visual semantic fusion-based radical information extractor (RIE) and a knowledge graph character reasoner (KGR). RIE aims to recognize candidate radicals and their possible structural relations from character images in parallel. The results are then fed into KGR to recognize the target character by reasoning with a knowledge graph. We validate our method on multiple datasets, and RZCR shows promising experimental results, especially on few-sample character datasets.
List of keywords
Computer Vision -> CV: Vision and language 
Multidisciplinary Topics and Applications -> MDA: Humanities
3221
Beyond Homophily: Robust Graph Anomaly Detection via Neural Sparsification
Zheng Gong, Guifeng Wang, Ying Sun, Qi Liu, Yuting Ning, Hui Xiong, Jingyu Peng
[+] More
[-] Less
Recently, graph-based anomaly detection (GAD) has attracted rising attention due to its effectiveness in identifying anomalies in relational and structured data. Unfortunately, the performance of most existing GAD methods suffers from the inherent structural noises of graphs induced by hidden anomalies connected with considerable benign nodes. In this work, we propose SparseGAD, a novel GAD framework that sparsifies the structures of target graphs to effectively reduce noises and collaboratively learns node representations. It then robustly detects anomalies by uncovering the underlying dependency among node pairs in terms of homophily and heterophily, two essential connection properties of GAD. Extensive experiments on real-world datasets of GAD demonstrate that the proposed framework achieves significantly better detection quality compared with the state-of-the-art methods, even when the graph is heavily attacked. Code will be available at https://github.com/KellyGong/SparseGAD.git.
List of keywords
Data Mining -> DM: Applications
Data Mining -> DM: Anomaly/outlier detection
Data Mining -> DM: Mining graphs
3222
Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models
Bingbing Li, Zigeng Wang, Shaoyi Huang, Mikhail Bragin, Ji Li, Caiwen Ding
[+] More
[-] Less
Pruning has been extensively studied in Transformer-based language models to improve efficiency. Typically, we zero (prune) unimportant model weights and train a derived compact model to improve final accuracy. For pruned weights, we treat them as useless and discard them. This usually leads to significant model accuracy degradation. In this paper, we focus on attention head pruning as head attention is a key component of the transformer-based language models and provides interpretable knowledge meaning. We reveal the relationship between pruned attention heads and retained heads and provide a solution to recycle the discarded knowledge from the pruned heads, named peer distillation. We also develop an automatic framework to locate the to-be-pruned attention heads in each layer, freeing the time-consuming human labor in tuning hyperparameters. Experimental results on the General Language Understanding Evaluation (GLUE) benchmark are provided using BERT model. By recycling discarded knowledge from pruned heads, the proposed method maintains model performance across all nine tasks while reducing heads by over 58% on average and outperforming state-of-the-art techniques (e.g., Random, HISP, L0 Norm, SMP).
List of keywords
Natural Language Processing -> NLP: Language models
3243
MAT: Mixed-Strategy Game of Adversarial Training in Fine-tuning
Zhehua Zhong, Tianyi Chen, Zhen Wang
[+] More
[-] Less
Fine-tuning large-scale pre-trained language models has been demonstrated effective for various natural language processing (NLP) tasks. Previous studies have established that incorporating adversarial training during the fine-tuning stage can significantly enhance model generalization and robustness. However, from the perspective of game theory, such utilizations of adversarial training correspond to pure-strategy games, which are inherently limited in terms of the scope of their strategies, thereby still having room for improvement. In order to push the performance boundaries, we propose a novel Mixed-strategy Adversarial Training algorithm (MAT). Methodologically, we derive the Nash equilibrium of a mixed-strategy game for adversarial training using Entropy Mirror Descent to establish MAT by sampling method. To verify the effectiveness of MAT, we conducted extensive benchmark experiments on large-scale pre-trained models, such as BERT and RoBERTa. MAT significantly outperforms the state-of-the-art methods on both the GLUE and ANLI benchmarks in terms of generalization and robustness.
List of keywords
Machine Learning -> ML: Adversarial machine learning
Natural Language Processing -> NLP: Other
3254
SAD: Semi-Supervised Anomaly Detection on Dynamic Graphs
Sheng Tian, Jihai Dong, Jintang Li, WENLONG ZHAO, Xiaolong Xu, Baokun Wang, Bowen Song, Changhua Meng, Tianyi Zhang, Liang Chen
[+] More
[-] Less
Anomaly detection aims to distinguish abnormal instances that deviate significantly from the majority of benign ones. As instances that appear in the real world are naturally connected and can be represented with graphs, graph neural networks become increasingly popular in tackling the anomaly detection problem. Despite the promising results, research on anomaly detection has almost exclusively focused on static graphs while the mining of anomalous patterns from dynamic graphs is rarely studied but has significant application value. In addition, anomaly detection is typically tackled from semi-supervised perspectives due to the lack of sufficient labeled data. However, most proposed methods are limited to merely exploiting labeled data, leaving a large number of unlabeled samples unexplored. In this work, we present semi-supervised anomaly detection (SAD), an end-to-end framework for anomaly detection on dynamic graphs. By a combination of a time-equipped memory bank and a pseudo-label contrastive learning module, SAD is able to fully exploit the potential of large unlabeled samples and uncover underlying anomalies on evolving graph streams. Extensive experiments on four real-world datasets demonstrate that SAD efficiently discovers anomalies from dynamic graphs and outperforms existing advanced methods even when provided with only little labeled data.
List of keywords
Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Time series and data streams
3260
Beyond Pure Text: Summarizing Financial Reports Based on Both Textual and Tabular Data
Ziao Wang, Zelin Jiang, Xiaofeng Zhang, Jaehyeon Soon, Jialu Zhang, Wang Xiaoyao, Hongwei Du
[+] More
[-] Less
Abstractive text summarization is to generate concise summaries that well preserve both salient information and the overall semantic meanings of the given documents. However, real-world documents, e.g., financial reports, generally contain rich data such as charts and tabular data which invalidates most existing text summarization approaches. This paper is thus motivated to propose this novel approach to simultaneously summarize both textual and tabular data. Particularly, we first manually construct a “table+text → summary” dataset. Then, the tabular data is respectively embedded in a row-wise and column-wise manner, and the textual data is encoded at the sentence-level via an employed pre-trained model. We propose a salient detector gate respectively performed between each pair of row/column and sentence embeddings. The highly correlated content is considered as salient information that must be summarized. Extensive experiments have been performed on our constructed dataset and the promising results demonstrate the effectiveness of the proposed approach w.r.t. a number of both automatic and human evaluation criteria.
List of keywords
Natural Language Processing -> NLP: Summarization
Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Language generation
3263
On Adversarial Robustness of Demographic Fairness in Face Attribute Recognition
Huimin Zeng, Zhenrui Yue, Lanyu Shang, Yang Zhang, Dong Wang
[+] More
[-] Less
Demographic fairness has become a critical objective when developing modern visual models for identity-sensitive applications, such as face attribute recognition (FAR). While great efforts have been made to improve the fairness of the models, the investigation on the adversarial robustness of the fairness (e.g., whether the fairness of the models could still be maintained under potential malicious fairness attacks) is largely ignored. Therefore, this paper explores the adversarial robustness of demographic fairness in FAR applications from both attacking and defending perspectives. In particular, we firstly present a novel fairness attack, who aims at corrupting the demographic fairness of face attribute classifiers. Next, to mitigate the effect of the fairness attack, we design an efficient defense algorithm called robust-fair training. With this defense, face attribute classifiers learn how to combat the bias introduced by the fairness attack. As such, the face attribute classifiers are not only trained to be fair, but the fairness is also robust. Our extensive experimental results show the effectiveness of both our proposed attack and defense methods across various model architectures and FAR applications. We believe our work could be strong baselines for future work on robust demographic fairness.
List of keywords
AI Ethics, Trust, Fairness -> ETF: Bias
Computer Vision -> CV: Bias, fairness and privacy
3271
IMF: Integrating Matched Features Using Attentive Logit in Knowledge Distillation
Jeongho Kim, Hanbeen Lee, Simon S. Woo
[+] More
[-] Less
Knowledge distillation (KD) is an effective method for transferring the knowledge of a teacher model to a student model, that aims to improve the latter’s performance efficiently. Although generic knowledge distillation methods such as softmax representation distillation and intermediate feature matching have demonstrated improvements with various tasks, only marginal improvements are shown in student networks due to their limited model capacity. In this work, to address the student model’s limitation, we propose a novel flexible KD framework, Integrating Matched Features using Attentive Logit in Knowledge Distillation (IMF). Our approach introduces an intermediate feature distiller (IFD) to improve the overall performance of the student model by directly distilling the teacher’s knowledge into branches of student models. The generated output of IFD, which is trained by the teacher model, is effectively combined by attentive logit. We use only a few blocks of the student and the trained IFD during inference, requiring an equal or less number of parameters. Through extensive experiments, we demonstrate that IMF consistently outperforms other state-of-the-art methods with a large margin over the various datasets in different tasks without extra computation.
List of keywords
Computer Vision -> CV: Structural and model-based approaches, knowledge representation and reasoning
Computer Vision -> CV: Representation learning
3273
Random Assignment of Indivisible Goods under Constraints
Yasushi Kawase, Hanna Sumita, Yu Yokoi
[+] More
[-] Less
We investigate the problem of random assignment of indivisible goods, in which each agent has an ordinal preference and a constraint. Our goal is to characterize the conditions under which a random assignment that simultaneously satisfies efficiency and envy-freeness always exists. The probabilistic serial mechanism ensures the existence of such an assignment for the unconstrained setting. In this paper, we consider a more general setting in which each agent can consume a set of items only if the set satisfies her feasibility constraint. Such constraints must be taken into account in student course placements, employee shift assignments, and so on. We demonstrate that an efficient and envy-free assignment may not exist even for the simple case of partition matroid constraints, where the items are categorized, and each agent demands one item from each category. We then identify special cases in which an efficient and envy-free assignment always exists. For these cases, the probabilistic serial cannot be naturally extended; therefore, we provide mechanisms to find the desired assignment using various approaches.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Fair division
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Mechanism design
3281
Some General Identification Results for Linear Latent Hierarchical Causal Structure
Zhengming Chen, Feng Xie, Jie Qiao, Zhifeng Hao, Ruichu Cai
[+] More
[-] Less
We study the problem of learning hierarchical causal structure among latent variables from many measured variables. Although there are a few methods that are able to recover the latent hierarchical causal structure, they suffer from restricted assumptions, such as the tree-structured graph, no “triangle" structure, or the non-Gaussian. In this paper, we consider the more general and challenging scenario in cases where there are no restrictions on the tree-structured graph and the “triangle" structure and the noise terms of data may be partially non-Gaussian. We show that the hierarchical causal structure is identifiable under milder graphical conditions. Specially, we first show that, based on second-order statistics, the latent hierarchical structure can be identified up to the Markov equivalence classes over latent variables. Then, we show some directions among latent variables on those Markov equivalence classes that can be inferred based on partial higher-order statistics. Further, we design a method to efficiently learn the latent hierarchical structure. The experimental results on synthetic data verify the efficiency of the proposed method.
List of keywords
Machine Learning -> ML: Causality
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
3294
MMPN: Multi-supervised Mask Protection Network for Pansharpening
Changjie Chen, Yong Yang, Shuying Huang, Wei Tu, Weiguo Wan, Shengna Wei
[+] More
[-] Less
Pansharpening is to fuse a panchromatic (PAN) image with a multispectral (MS) image to obtain a high-spatial-resolution multispectral (HRMS) image. The deep learning-based pansharpening methods usually apply the convolution operation to extract features and only consider the similarity of gradient information between PAN and HRMS images, resulting in the problems of edge blur and spectral distortion in the fusion results. To solve this problem, a multi-supervised mask protection network (MMPN) is proposed to prevent spatial information from being damaged and overcome spectral distortion in the learning process. Firstly, by analyzing the relationships between high-resolution images and corresponding degraded images, a mask protection strategy (MPS) for edge protection is designed to guide the recovery of fused images. Then, based on the MPS, an MMPN containing four branches is constructed to generate the fusion and mask protection images. In MMPN, each branch employs a dual-stream multi-scale feature fusion module (DMFFM), which is built to extract and fuse the features of two input images. Finally, different loss terms are defined for the four branches, and combined into a joint loss function to realize network training. Experiments on simulated and real satellite datasets show that our method is superior to state-of-the-art methods both subjectively and objectively.
List of keywords
Computer Vision -> CV: Applications
Computer Vision -> CV: Machine learning for vision
3308
Competitive-Cooperative Multi-Agent Reinforcement Learning for Auction-based Federated Learning
Xiaoli Tang, Han Yu
[+] More
[-] Less
Auction-based Federated Learning (AFL) is a key technology to enable open collaboration among self-interested data consumers and data owners. Existing AFL approaches cannot manage the mutual influence among multiple data consumers competing to enlist data owners. Moreover, they cannot support a single data owner to join multiple data consumers simultaneously. To bridge these gaps, we propose the Multi-Agent Reinforcement Learning for AFL (MARL-AFL) approach to steer data consumers to bid strategically towards an equilibrium with desirable overall system characteristics. We design a temperature-based reward reassignment scheme to make trade-offs between cooperation and competition among AFL data consumers. In this way, MARL-AFL can reach an equilibrium state that ensures individual data consumers can achieve good utility, while preserving system-level social welfare. To circumvent potential collusion behaviors among data consumers, we introduce a bar agent to set a personalized bidding lower bound for each data consumer. Extensive experiments on six commonly adopted benchmark datasets show that MARL-AFL is significantly more advantageous compared to six state-of-the-art approaches, outperforming the best by 12.2%, 1.9% and 3.4% in terms of average social welfare, revenue and model accuracy, respectively.
List of keywords
Machine Learning -> ML: Federated learning
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Machine Learning -> ML: Reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
3314
iRe^2f: Rethinking Effective Refinement in Language Structure Prediction via Efficient Iterative Retrospecting and Reasoning
Zuchao Li, Xingyi Guo, Letian Peng, Lefei Zhang, Hai Zhao
[+] More
[-] Less
Refinement plays a critical role in language structure prediction, a process that deals with complex situations such as structural edge interdependencies. Since language structure prediction usually modeled as graph parsing, typical refinement methods involve taking an initial parsing graph as input and refining it using language input and other relevant information. Intuitively, a refinement component, i.e., refiner, should be lightweight and efficient, as it is only responsible for correcting faults in the initial graph. However, current refiners add a significant burden to the parsing process due to their reliance on time-consuming encoding-decoding procedure on the language input and graph. To make the refiner more practical for real-world applications, this paper proposes a lightweight but effective iterative refinement framework, \textsc{iRe$^2$f}, based on iterative retrospecting and reasoning without involving the re-encoding process on the graph. \textsc{iRe$^2$f} iteratively refine the parsing graph based on interaction between graph and sequence and efficiently learns the shortcut to update the sequence and graph representations in each iteration. The shortcut is calculated based on the graph representation in the latest iteration. \textsc{iRe$^2$f} reduces the number of refinement parameters by $90\%$ compared to the previous smallest refiner. Experiments on a variety of language structure prediction tasks show that \textsc{iRe$^2$f} performs comparably or better than current state-of-the-art refiners, with a significant increase in efficiency.
List of keywords
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
Natural Language Processing -> NLP: Tagging, chunking, and parsing
3319
Temporal Constrained Feasible Subspace Learning for Human Pose Forecasting
Gaoang Wang, Mingli Song
[+] More
[-] Less
Human pose forecasting is a sequential modeling task that aims to predict future poses from historical motions. Most existing approaches focus on the spatial-temporal neural network model design for learning movement patterns to reduce prediction errors. However, they usually do not strictly follow the temporal constraints in the inference stage. Even though a small Mean Per Joint Position Error (MPJPE) is achieved, some of the predicted poses are not temporal feasible solutions, which disobeys the continuity of the body movement. In this paper, we consider the temporal constrained feasible solutions for human pose forecasting, where the predicted poses of input historical poses are guaranteed to obey the temporal constraints strictly in the inference stage. Rather than direct supervision of the prediction in the original pose space, a temporal constrained subspace is explicitly learned and then followed by an inverse transformation to obtain the final predictions. We evaluate the proposed method on large-scale benchmarks, including Human3.6M, AMASS, and 3DPW. With the STS-GCN as the encoder backbone, state-of-the-art performance has been achieved with the temporal constrained feasible solutions.
List of keywords
Computer Vision -> CV: Biometrics, face, gesture and pose recognition
Constraint Satisfaction and Optimization -> CSO: Constraint learning and acquisition
3349
Annealing Genetic Based Preposition Substitution for Text Rubbish Example Generation
Chen Li, Xinghao Yang, baodi liu, Weifeng Liu, Honglong Chen
[+] More
[-] Less
Modern Natural Language Processing (NLP) models expose under-sensitivity towards text rubbish examples. The text rubbish example is the heavily modified input text which is nonsensical to humans but does not change the model’s prediction. Prior work crafts rubbish examples by iteratively deleting words and determining the deletion order with beam search. However, the produced rubbish examples usually cause a reduction in model confidence and sometimes deliver human-readable text. To address these problems, we propose an Annealing Genetic based Preposition Substitution (AGPS) algorithm for text rubbish sample generation with two major merits. Firstly, the AGPS crafts rubbish text examples by substituting input words with meaningless prepositions instead of directly removing them, which brings less degradation to the model’s confidence. Secondly, we design an Annealing Genetic algorithm to optimize the word replacement priority, which allows the Genetic Algorithm (GA) to jump out the local optima with probabilities. This is significant in achieving better objectives, i.e., a high word modification rate and a high model confidence. Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences.
List of keywords
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
Natural Language Processing -> NLP: Text classification
3352
Detecting Adversarial Faces Using Only Real Face Self-Perturbations
Qian Wang, Yongqin Xian, Hefei Ling, Jinyuan Zhang, Xiaorui Lin, Ping Li, Jiazhong Chen, Ning Yu
[+] More
[-] Less
Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniques achieve high accuracy in detecting some specific adversarial faces (adv-faces), new attack methods especially GAN-based attacks with completely different noise patterns circumvent them and reach a higher attack success rate. Even worse, existing techniques require attack data before implementing the defense, making it impractical to defend newly emerging attacks that are unseen to defenders. In this paper, we investigate the intrinsic generality of adv-faces and propose to generate pseudo adv-faces by perturbing real faces with three heuristically designed noise patterns. We are the first to train an adv-face detector using only real faces and their self-perturbations, agnostic to victim facial recognition systems, and agnostic to unseen attacks. By regarding adv-faces as out-of-distribution data, we then naturally introduce a novel cascaded system for adv-face detection, which consists of training data self-perturbations, decision boundary regularization, and a max-pooling-based binary classifier focusing on abnormal local color aberrations. Experiments conducted on LFW and CelebA-HQ datasets with eight gradient-based and two GAN-based attacks validate that our method generalizes to a variety of unseen adversarial attacks.
List of keywords
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Applications
Computer Vision -> CV: Biometrics, face, gesture and pose recognition
3357
Spatially Constrained Adversarial Attack Detection and Localization in the Representation Space of Optical Flow Networks
Hannah Kim, Celia Cintas, Girmaw Abebe Tadesse, Skyler Speakman
[+] More
[-] Less
Optical flow estimation have shown significant improvements with advances in deep neural networks. However, these flow networks have recently been shown to be vulnerable to patch-based adversarial attacks, which poses security risks in real-world applications, such as self-driving cars and robotics. We propose SADL, a Spatially constrained adversarial Attack Detection and Localization framework, to detect and localize these patch-based attack without requiring a dedicated training. The detection of an attacked input sequence is performed via iterative optimization on the features from the inner layers of flow networks, without any prior knowledge of the attacks. The novel spatially constrained optimization ensures that the detected anomalous subset of features comes from a local region. To this end, SADL provides a subset of nodes within a spatial neighborhood that contribute more to the detection, which will be utilized to localize the attack in the input sequence. The proposed SADL is validated across multiple datasets and flow networks. With patch attacks 4.8% of the size of the input image resolution on RAFT, our method successfully detects and localizes them with an average precision of 0.946 and 0.951 for KITTI-2015 and MPI-Sintel datasets, respectively. The results show that SADL consistently achieves higher detection rates than existing methods and provides new localization capabilities.
List of keywords
Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
3364
Fast Algorithms for SAT with Bounded Occurrences of Variables
Junqiang Peng, Mingyu Xiao
[+] More
[-] Less
We present fast algorithms for the general CNF satisfiability problem (SAT) with running-time bound $O^*({c_{d}}^{n})$, where $c_d$ is a function of the average occurrence $d$ of variables, and $n$ is the number of variables in the input formula. Similar to SAT with bounded clause lengths, SAT with bounded occurrences of variables has also been extensively studied in the literature. Especially, the running-time bounds for small $d$, say $d=3$ and $4$, have become the bottlenecks for algorithms evaluated by the formula length $L$ and other algorithms. In this paper, we show that SAT can be solved in time $O^*(1.1238^n)$ for $d=3$ and $O^*(1.2628^n)$ for $d=4$, respectively improving the previous results $O^*(1.1279^n)$ and $O^*(1.2721^n)$ obtained by Wahlstr\"{o}m (SAT 2005) nearly 20 years ago. For $d\geq 5$, we obtain the running time bound $O^*(1.0641^{dn})$, which implies the bound $O^*(1.0641^{L})$ with respective to the formula length $L$. This result is also competitive with the previous result $O^*(1.0646^{L})$ by Peng and Xiao (SAT 2021).
List of keywords
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
3365
Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents?
Ridhima Bector, Hang Xu, Abhay M S Aradhya, Chai Quek, Zinovi Rabinovich
[+] More
[-] Less
As Reinforcement Learning (RL) solutions are becoming ubiquitous, so is the study of potential threats to their training and deployment. While single-learner training-time attacks, capable of "pre-programming" behavioral triggers into a strategy, receive increasing attention, attacks on collections of learning agents have been largely overlooked. We remedy the situation by developing a constructive training-time attack on a population of learning agents and make the attack agnostic to the size of the population. The attack constitutes a sequence of environment (re)parameterizations (poisonings), generated to overcome individual differences between agents and lead the entire population to the same target behavior while minimizing effective environment modulation. Our method is demonstrated on populations of independent learners in "ghost" environments (learners do not interact or perceive each other) as well as environments with mutual awareness, with or without individual learning. From the attack perspective, we pursue an ultra-blackbox setting, i.e., cross-policy traces of the victim learners are the only input both for attack conditioning {\it and} attack evaluation during the attacker’s training. To manage the resulting uncertainty in population behavior, we deploy a novel Wasserstein distance-based Gaussian embedding of detected behaviors within the population of victim learners. To align with prior works on environment poisoning, our experiments are based on a 3D Grid World domain and show: a) feasibility, i.e., despite the uncertainty, the attack forces a population-wide adoption of target behavior; b) efficacy, i.e., the attack is size-agnostic and transferable.
List of keywords
Machine Learning -> ML: Adversarial machine learning
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Deep reinforcement learning
3369
Hierarchical Prompt Learning for Compositional Zero-Shot Recognition
Henan Wang, Muli Yang, Kun Wei, Cheng Deng
[+] More
[-] Less
Compositional Zero-Shot Learning (CZSL) aims to imitate the powerful generalization ability of human beings to recognize novel compositions of known primitive concepts that correspond to a state and an object, e.g., purple apple. To fully capture the intra- and inter-class correlations between compositional concepts, in this paper, we propose to learn them in a hierarchical manner. Specifically, we set up three hierarchical embedding spaces that respectively model the states, the objects, and their compositions, which serve as three “experts” that can be combined in inference for more accurate predictions. We achieve this based on the recent success of large-scale pretrained vision-language models, e.g., CLIP, which provides a strong initial knowledge of image-text relationships. To better adapt this knowledge to CZSL, we propose to learn three hierarchical prompts by explicitly fixing the unrelated word tokens in the three embedding spaces. Despite its simplicity, our proposed method consistently yields superior performance over current state-of-the-art approaches on three widely-used CZSL benchmarks.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning   
3378
The Effects of AI Biases and Explanations on Human Decision Fairness: A Case Study of Bidding in Rental Housing Markets
Xinru Wang, Chen Liang, Ming Yin
[+] More
[-] Less
The use of AI-based decision aids in diverse domains has inspired many empirical investigations into how AI models’ decision recommendations impact humans’ decision accuracy in AI-assisted decision making, while explorations on the impacts on humans’ decision fairness are largely lacking despite its clear importance. In this paper, using a real-world business decision making scenario—bidding in rental housing markets—as our testbed, we present an experimental study on understanding how the bias level of the AI-based decision aid as well as the provision of AI explanations affect the fairness level of humans’ decisions, both during and after their usage of the decision aid. Our results suggest that when people are assisted by an AI-based decision aid, both the higher level of racial biases the decision aid exhibits and surprisingly, the presence of AI explanations, result in more unfair human decisions across racial groups. Moreover, these impacts are partly made through triggering humans’ “disparate interactions” with AI. However, regardless of the AI bias level and the presence of AI explanations, when people return to make independent decisions after their usage of the AI-based decision aid, their decisions no longer exhibit significant unfairness across racial groups.
List of keywords
Humans and AI -> HAI: Human-AI collaboration
Humans and AI -> HAI: Human-computer interaction
3386
Statistically Significant Concept-based Explanation of Image Classifiers via Model Knockoffs
Kaiwen Xu, Kazuto Fukuchi, Youhei Akimoto, Jun Sakuma
[+] More
[-] Less
A concept-based classifier can explain the decision process of a deep learning model by human understandable concepts in image classification problems. However, sometimes concept-based explanations may cause false positives, which misregards unrelated concepts as important for the prediction task. Our goal is to find the statistically significant concept for classification to prevent misinterpretation. In this study, we propose a method using a deep learning model to learn the image concept and then using the knockoff sample to select the important concepts for prediction by controlling the False Discovery Rate (FDR) under a certain value. We evaluate the proposed method in our experiments on both synthetic and real data. Also, it shows that our method can control the FDR properly while selecting highly interpretable concepts to improve the trustworthiness of the model.
List of keywords
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Explainable/Interpretable machine learning
3395
OptIForest: Optimal Isolation Forest for Anomaly Detection
Haolong Xiang, Hongsheng Hu, Xiaolong Xu, Lianyong Qi, Wanchun Dou, Mark Dras, Amin Beheshti, Xuyun Zhang
[+] More
[-] Less
Anomaly detection plays an increasingly important role in various fields for critical tasks such as intrusion detection in cybersecurity, financial risk detection, and human health monitoring. A variety of anomaly detection methods have been proposed, and a category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency, e.g., iForest is often employed as a state-of-the-art detector for real deployment. While the majority of isolation forests use the binary structure, a framework LSHiForest has demonstrated that the multi-fork isolation tree structure can lead to better detection performance. However, there is no theoretical work answering the fundamentally and practically important question on the optimal tree structure for an isolation forest with respect to the branching factor. In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree. Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash which enables more information to be learned from data for better isolation quality. The rationale of our approach relies on a better bias-variance trade-off achieved by bias reduction in OptIForest. Extensive experiments on a series of benchmarking datasets for comparative and ablation studies demonstrate that our approach can efficiently and robustly achieve better detection performance in general than the state-of-the-arts including the deep learning based methods.
List of keywords
Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Ensemble methods
3398
Action Space Reduction for Planning Domains
Harsha Kokel, Junkyu Lee, Michael Katz, Kavitha Srinivas, Shirin Sohrabi
[+] More
[-] Less
Planning tasks succinctly represent labeled transition systems, with each ground action corresponding to a label. This granularity, however, is not necessary for solving planning tasks and can be harmful, especially for model-free methods. In order to apply such methods, the label sets are often manually reduced. In this work, we propose automating this manual process. We characterize a valid label reduction for classical planning tasks and propose an automated way of obtaining such valid reductions by leveraging lifted mutex groups. Our experiments show a significant reduction in the action label space size across a wide collection of planning domains. We demonstrate the benefit of our automated label reduction in two separate use cases: improved sample complexity of model-free reinforcement learning algorithms and speeding up successor generation in lifted planning.
List of keywords
Planning and Scheduling -> PS: Theoretical foundations of planning
3414
Dual Personalization on Federated Recommendation
Chunxu Zhang, Guodong Long, Tianyi Zhou, Peng Yan, Zijian Zhang, Chengqi Zhang, Bo Yang
[+] More
[-] Less
Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of recommender systems in federated settings. The code is available.
List of keywords
Machine Learning -> ML: Federated learning
Data Mining -> DM: Privacy-preserving data mining
Data Mining -> DM: Recommender systems
3422
Simulation-Assisted Optimization for Large-Scale Evacuation Planning with Congestion-Dependent Delays
Kazi Ashik Islam, Da Qi Chen, Madhav Marathe, Henning Mortveit, Samarth Swarup, Anil Vullikanti
[+] More
[-] Less
Evacuation planning is a crucial part of disaster management where the goal is to relocate people to safety and minimize casualties. However, joint optimization of its two essential components, routing and scheduling, with objectives such as minimizing average evacuation time or evacuation completion time, is a computationally hard problem. To approach it, we present MIP-LNS, a scalable optimization method that utilizes heuristic search with mathematical optimization and can optimize a variety of objective functions. We also present the method MIP-LNS-SIM, where we combine agent-based simulation with MIP-LNS to estimate delays due to congestion, as well as, find optimized plans considering such delays. We use Harris County in Houston, Texas, as our study area. We show that, within a given time limit, MIP-LNS finds better solutions than existing methods in terms of average evacuation time, evacuation completion time, and optimality guarantee of the solutions (13%, 21%, 58% improvement respectively). We then show that MIP-LNS-SIM outperforms MIP-LNS in terms of average evacuation time, evacuation completion time, and average time spent on the road (10%, 17%, 77% improvement respectively) when delay due to congestion is considered. In addition, MIP-LNS-SIM has a significantly lower percent error in estimated evacuation completion time (6%) compared to MIP-LNS (76%).
List of keywords
Planning and Scheduling -> PS: Search in planning and scheduling
Agent-based and Multi-agent Systems -> MAS: Applications
Search -> S: Heuristic search
3423
New Algorithms for the Fair and Efficient Allocation of Indivisible Chores
Jugal Garg, Aniket Murhekar, John Qin
[+] More
[-] Less
We study the problem of fairly and efficiently allocating indivisible chores among agents with additive disutility functions. We consider the widely used envy-based fairness properties of EF1 and EFX in conjunction with the efficiency property of fractional Pareto-optimality (fPO). Existence (and computation) of an allocation that is simultaneously EF1/EFX and fPO are challenging open problems, and we make progress on both of them. We show the existence of an allocation that is – EF1 + fPO, when there are three agents, – EF1 + fPO, when there are at most two disutility functions, – EFX + fPO, for three agents with bivalued disutility functions. These results are constructive, based on strongly polynomial-time algorithms. We also investigate non-existence and show that an allocation that is EFX+fPO need not exist, even for two agents.
List of keywords
Game Theory and Economic Paradigms -> GTEP: Fair division
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Computational social choice
3434
pTSE: A Multi-model Ensemble Method for Probabilistic Time Series Forecasting
Yunyi Zhou, Zhixuan Chu, Yijia Ruan, Ge Jin, yuchen huang, Sheng Li
[+] More
[-] Less
Various probabilistic time series forecasting models have sprung up and shown remarkably good performance. However, the choice of model highly relies on the characteristics of the input time series and the fixed distribution that model is based on. Due to the fact that the probability distributions cannot be averaged over different models straightforwardly, the current time series model ensemble methods cannot be directly applied to improve the robustness and accuracy of forecasting. To address this issue, we propose pTSE, a multi-model distribution ensemble method for probabilistic forecasting based on Hidden Markov Model (HMM). pTSE only takes off-the-shelf outputs from member models without requiring further information about each model. Besides, we provide a complete theoretical analysis of pTSE to prove that the empirical distribution of time series subject to an HMM will converge to the stationary distribution almost surely. Experiments on benchmarks show the superiority of pTSE over all member models and competitive ensemble methods.
List of keywords
Machine Learning -> ML: Probabilistic machine learning
Machine Learning -> ML: Time series and data streams
3440
DiSProD: Differentiable Symbolic Propagation of Distributions for Planning
Palash Chatterjee, Ashutosh Chapagain, Weizhe Chen, Roni Khardon
[+] More
[-] Less
The paper introduces DiSProD, an online planner developed for environments with probabilistic transitions in continuous state and action spaces. DiSProD builds a symbolic graph that captures the distribution of future trajectories, conditioned on a given policy, using independence assumptions and approximate propagation of distributions. The symbolic graph provides a differentiable representation of the policy’s value, enabling efficient gradient-based optimization for long-horizon search. The propagation of approximate distributions can be seen as an aggregation of many trajectories, making it well-suited for dealing with sparse rewards and stochastic environments. An extensive experimental evaluation compares DiSProD to state-of-the-art planners in discrete-time planning and real-time control of robotic systems. The proposed method improves over existing planners in handling stochastic environments, sensitivity to search depth, sparsity of rewards, and large action spaces. Additional real-world experiments demonstrate that DiSProD can control ground vehicles and surface vessels to successfully navigate around obstacles.
List of keywords
Planning and Scheduling -> PS: Planning under uncertainty
Planning and Scheduling -> PS: Planning algorithms
Planning and Scheduling -> PS: Robot planning
3449
K∗ Search Over Orbit Space for Top-k Planning
Michael Katz, Junkyu Lee
[+] More
[-] Less
Top-k planning is a key formalism for many planning applications. K* search is a well-established approach to top-k planning. The algorithm iteratively runs A* search and Eppstein’s algorithm until a sufficient number of plans is found. The performance of K* algorithm is therefore inherently limited by the performance of A*, and in order to improve K* performance, that of A* must be improved. In cost-optimal planning, orbit space search improves A* performance, essentially performing A* in the orbit space instead of state space. In this work, we take a similar approach to top-k planning. We show theoretic equivalence between the goal paths in the state space and in the orbit space, allowing to perform K* search in the orbit space instead, reconstructing plans from the found paths. We prove that our algorithm is sound and complete for top-k planning and empirically show it to achieve state-of-the-art performance, overtaking all existing to date top-k planners.
List of keywords
Planning and Scheduling -> PS: Search in planning and scheduling
Planning and Scheduling -> PS: Planning algorithms
Planning and Scheduling -> PS: Theoretical foundations of planning
3475
Do We Need an Encoder-Decoder to Model Dynamical Systems on Networks?
Bing Liu, Wei Luo, Gang Li, Jing Huang, Bo Yang
[+] More
[-] Less
As deep learning gains popularity in modelling dynamical systems, we expose an underappreciated misunderstanding relevant to modelling dynamics on networks. Strongly influenced by graph neural networks, latent vertex embeddings are naturally adopted in many neural dynamical network models. However, we show that embeddings tend to induce a model that fits observations well but simultaneously has incorrect dynamical behaviours. Recognising that previous studies narrowly focus on short-term predictions during the transient phase of a flow, we propose three tests for correct long-term behaviour, and illustrate how an embedding-based dynamical model fails these tests, and analyse the causes, particularly through the lens of topological conjugacy. In doing so, we show that the difficulties can be avoided by not using embedding. We propose a simple embedding-free alternative based on parametrising two additive vector-field components. Through extensive experiments, we verify that the proposed model can reliably recover a broad class of dynamics on different network topologies from time series data.
List of keywords
Data Mining -> DM: Networks
Machine Learning -> ML: Time series and data streams
3478
RaMLP: Vision MLP via Region-aware Mixing
Shenqi Lai, Xi Du, Jia Guo, Kaipeng Zhang
[+] More
[-] Less
Recently, MLP-based architectures achieved impressive results in image classification against CNNs and ViTs. However, there is an obvious limitation in that their parameters are related to image sizes, allowing them to process only fixed image sizes. Therefore, they cannot directly adapt dense prediction tasks (e.g., object detection and semantic segmentation) where images are of various sizes. Recent methods tried to address it but brought two new problems, long-range dependencies or important visual cues are ignored. This paper presents a new MLP-based architecture, Region-aware MLP (RaMLP), to satisfy various vision tasks and address the above three problems. In particular, we propose a well-designed module, Region-aware Mixing (RaM). RaM captures important local information and further aggregates these important visual clues. Based on RaM, RaMLP achieves a global receptive field even in one block. It is worth noting that, unlike most existing MLP-based architectures that adopt the same spatial weights to all samples, RaM is region-aware and adaptively determines weights to extract region-level features better. Impressively, our RaMLP outperforms state-of-the-art ViTs, CNNs, and MLPs on both ImageNet-1K image classification and downstream dense prediction tasks, including MS-COCO object detection, MS-COCO instance segmentation, and ADE20K semantic segmentation. In particular, RaMLP outperforms MLPs by a large margin (around 1.5% Apb or 1.0% mIoU) on dense prediction tasks.
List of keywords
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
3482
Social Motivation for Modelling Other Agents under Partial Observability in Decentralised Training
Dung Nguyen, Hung Le, Kien Do, Svetha Venkatesh, Truyen Tran
[+] More
[-] Less
Understanding other agents is a key challenge in constructing artificial social agents. Current works focus on centralised training, wherein agents are allowed to know all the information about others and the environmental state during training. In contrast, this work studies decentralised training, wherein agents must learn the model of other agents in order to cooperate with them under partially-observable conditions, even during training, i.e. learning agents are myopic. The intrinsic motivation for artificial agents is modelled on the concept of human social motivation that entices humans to meet and understand each other, especially when experiencing a utility loss. Our intrinsic motivation encourages agents to stay near each other to obtain better observations and construct a model of others. They do so when their model of other agents is poor, or the overall task performance is bad during the learning phase. This simple but effective method facilitates the processes of modelling others, resulting in an improvement of the performance in cooperative tasks significantly. Our experiments demonstrate that the socially-motivated agent can model others better and promote cooperation across different tasks.
List of keywords
Machine Learning -> ML: Deep reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Other
3497
JEPOO: Highly Accurate Joint Estimation of Pitch, Onset and Offset for Music Information Retrieval
Haojie Wei, Yuan Jun, Rui Zhang, Yueguo Chen, Gang Wang
[+] More
[-] Less
Melody extraction is a core task in music information retrieval, and the estimation of pitch, onset and offset are key sub-tasks in melody extraction. Existing methods have limited accuracy, and work for only one type of data, either single-pitch or multi-pitch. In this paper, we propose a highly accurate method for joint estimation of pitch, onset and offset, named JEPOO. We address the challenges of joint learning optimization and handling both single-pitch and multi-pitch data through novel model design and a new optimization technique named Pareto modulated loss with loss weight regularization. This is the first method that can accurately handle both single-pitch and multi-pitch music data, and even a mix of them. A comprehensive experimental study on a wide range of real datasets shows that JEPOO outperforms state-of-the-art methods by up to 10.6%, 8.3% and 10.3% for the prediction of Pitch, Onset and Offset, respectively, and JEPOO is robust for various types of data and instruments. The ablation study validates the effectiveness of each component of JEPOO
List of keywords
Multidisciplinary Topics and Applications -> MDA: Arts and creativity
Multidisciplinary Topics and Applications -> MDA: Entertainment
Multidisciplinary Topics and Applications -> MDA: Other
3510
Boosting Few-Shot Open-Set Recognition with Multi-Relation Margin Loss
Yongjuan Che, Yuexuan An, Hui Xue
[+] More
[-] Less
Few-shot open-set recognition (FSOSR) has become a great challenge, which requires classifying known classes and rejecting the unknown ones with only limited samples. Existing FSOSR methods mainly construct an ambiguous distribution of known classes from scarce known samples without considering the latent distribution information of unknowns, which degrades the performance of open-set recognition. To address this issue, we propose a novel loss function called multi-relation margin (MRM) loss that can plug in few-shot methods to boost the performance of FSOSR. MRM enlarges the margin between different classes by extracting the multi-relationship of paired samples to dynamically refine the decision boundary for known classes and implicitly delineate the distribution of unknowns. Specifically, MRM separates the classes by enforcing a margin while concentrating samples of the same class on a hypersphere with a learnable radius. In order to better capture the distribution information of each class, MRM extracts the similarity and correlations among paired samples, ameliorating the optimization of the margin and radius. Experiments on public benchmarks reveal that methods with MRM loss can improve the unknown detection of AUROC by a significant margin while correctly classifying the known classes.
List of keywords
Machine Learning -> ML: Meta-learning
Machine Learning -> ML: Few-shot learning
3525
SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference
Boren Hu, Yun Zhu, Jiacheng Li, Siliang Tang
[+] More
[-] Less
Dynamic early exiting has been proven to improve the inference speed of the pre-trained language model like BERT. However, all samples must go through all consecutive layers before early exiting and more complex samples usually go through more layers, which still exists redundant computation. In this paper, we propose a novel dynamic early exiting combined with layer skipping for BERT inference named SmartBERT, which adds a skipping gate and an exiting operator into each layer of BERT. SmartBERT can adaptively skip some layers and adaptively choose whether to exit. Besides, we propose cross-layer contrastive learning and combine it into our training phases to boost the intermediate layers and classifiers which would be beneficial for early exiting. To keep the inconsistent usage of skipping gates between training and inference phases, we propose a hard weight mechanism during training phase. We conduct experiments on eight classification datasets of the GLUE benchmark. Experimental results show that SmartBERT achieves 2-3× computation reduction with minimal accuracy drops compared with BERT and our method outperforms previous methods in both efficiency and accuracy. Moreover, in some complex datasets, we prove that the early exiting based on entropy hardly works, and the skipping mechanism is essential for reducing computation.
List of keywords
Natural Language Processing -> NLP: Language models
Natural Language Processing -> NLP: Text classification
3526
Deep Hashing-based Dynamic Stock Correlation Estimation via Normalizing Flow
Xiaolin Zheng, Mengpu Liu, Mengying Zhu
[+] More
[-] Less
In financial scenarios, influenced by common factors such as global macroeconomic and sector-specific factors, stocks exhibit varying degrees of correlations with each other, which is essential in risk-averse portfolio allocation. Because the real risk matrix is unobservable, the covariance-based correlation matrix is widely used for constructing diversified stock. However, seldom studies focus on dynamic correlation matrix estimation under the non-stationary financial market. Moreover, as the number of stocks growing, the training process of existing correlation matrix estimation methods becomes significantly complicated and slow. In this paper, we propose a novel hash-based dynamic correlation forecasting model (HDCF) to estimation the dynamic stock correlation. Under a structural assumption of sparsity and slow-varying evolving, HDCF learns the hash representation of the correlation matrix, which performs extremely efficiently in high-dimensional settings. Experiments show that our proposed model outperforms baselines on portfolio decisions.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Finance
Machine Learning -> ML: Representation learning
3531
Guide to Control: Offline Hierarchical Reinforcement Learning using Subgoal Generation for Long-Horizon and Complex Tasks
Wonchul Shin, Yusung Kim
[+] More
[-] Less
Reinforcement learning (RL) has achieved considerable success in many fields, but applying it to real-world problems can be costly and risky because it requires a lot of online interaction. Recently, offline RL has shown the possibility of extracting a solution through existing logged data without online interaction. In this work, we propose an offline hierarchical RL method, Guider (Guide to Control), that can efficiently solve long-horizon and complex tasks from offline data. The high-level policy sequentially generates a subgoal that can guide the agent to arrive at the final goal, and the lower-level policy learns how to reach each given guided subgoal. In the process of learning from offline data, the key is to make the low-level policy reachable to the generated subgoals. We show that high-quality subgoal generation is possible through pre-training a latent subgoal prior model. The well-regulated subgoal generation improves performance while avoiding distributional shifts in offline RL by breaking down long, complex tasks into shorter, easier ones. For evaluations, Guider outperforms prior offline RL methods in long-horizon robot navigation and complex manipulation benchmarks. Our code is available at "this".
List of keywords
Machine Learning -> ML: Deep reinforcement learning
Planning and Scheduling -> PS: Learning in planning and scheduling
Robotics -> ROB: Behavior and control
3535
Distilling Universal and Joint Knowledge for Cross-Domain Model Compression on Time Series Data
Qing Xu, Min Wu, Xiaoli Li, Kezhi Mao, Zhenghua Chen
[+] More
[-] Less
For many real-world time series tasks, the computational complexity of prevalent deep leaning models often hinders the deployment on resource limited environments (e.g., smartphones). Moreover, due to the inevitable domain shift between model training (source) and deploying (target) stages, compressing those deep models under cross-domain scenarios becomes more challenging. Although some of existing works have already explored cross-domain knowledge distillation for model compression, they are either biased to source data or heavily tangled between source and target data. To this end, we design a novel end-to-end framework called UNiversal and joInt Knowledge Distillation (UNI-KD) for cross-domain model compression. In particular, we propose to transfer both the universal feature-level knowledge across source and target domains and the joint logit-level knowledge shared by both domains from the teacher to the student model via an adversarial learning scheme. More specifically, a feature-domain discriminator is employed to align teacher’s and student’s representations for universal knowledge transfer. A data-domain discriminator is utilized to prioritize the domain-shared samples for joint knowledge transfer. Extensive experimental results on four time series datasets demonstrate the superiority of our proposed method over state-of-the-art (SOTA) benchmarks. The source code is available at https://github.com/ijcai2023/UNI KD.
List of keywords
Machine Learning -> ML: Multi-task and transfer learning
Machine Learning -> ML: Time series and data streams
Machine Learning -> ML: Unsupervised learning
3540
FedNoRo: Towards Noise-Robust Federated Learning by Addressing Class Imbalance and Label Noise Heterogeneity
Nannan Wu, Li Yu, Xuefeng Jiang, Kwang-Ting Cheng, Zengqiang Yan
[+] More
[-] Less
Federated noisy label learning (FNLL) is emerging as a promising tool for privacy-preserving multi-source decentralized learning. Existing research, relying on the assumption of class-balanced global data, might be incapable to model complicated label noise, especially in medical scenarios. In this paper, we first formulate a new and more realistic federated label noise problem where global data is class-imbalanced and label noise is heterogeneous, and then propose a two-stage framework named FedNoRo for noise-robust federated learning. Specifically, in the first stage of FedNoRo, per-class loss indicators followed by Gaussian Mixture Model are deployed for noisy client identification. In the second stage, knowledge distillation and a distance-aware aggregation function are jointly adopted for noise-robust federated model updating. Experimental results on the widely-used ICH and ISIC2019 datasets demonstrate the superiority of FedNoRo against the state-of-the-art FNLL methods for addressing class imbalance and label noise heterogeneity in real-world FL scenarios.
List of keywords
Machine Learning -> ML: Federated learning
Machine Learning -> ML: Classification
Machine Learning -> ML: Robustness
3545
Graph Neural Convection-Diffusion with Heterophily
KAI ZHAO, Qiyu Kang, Yang Song, Rui She, Sijie Wang, Wee Peng Tay
[+] More
[-] Less
Graph neural networks (GNNs) have shown promising results across various graph learning tasks, but they often assume homophily, which can result in poor performance on heterophilic graphs. The connected nodes are likely to be from different classes or have dissimilar features on heterophilic graphs. In this paper, we propose a novel GNN that incorporates the principle of heterophily by modeling the flow of information on nodes using the convection-diffusion equation (CDE). This allows the CDE to take into account both the diffusion of information due to homophily and the “convection” of information due to heterophily. We conduct extensive experiments, which suggest that our framework can achieve competitive performance on node classification tasks for heterophilic graphs compared to the state-of-the-art methods.
List of keywords
Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Classification
3548
Analyzing and Combating Attribute Bias for Face Restoration
Zelin Li, Dan Zeng, Xiao Yan, Qiaomu Shen, Bo Tang
[+] More
[-] Less
Face restoration (FR) recovers high resolution (HR) faces from low resolution (LR) faces and is challenging due to its ill-posed nature. With years of development, existing methods can produce quality HR faces with realistic details. However, we observe that key facial attributes (e.g., age and gender) of the restored faces could be dramatically different from the LR faces and call this phenomenon attribute bias, which is fatal when using FR for applications such as surveillance and security. Thus, we argue that FR should consider not only image quality as in existing works but also attribute bias. To this end, we thoroughly analyze attribute bias with extensive experiments and find that two major causes are the lack of attribute information in LR faces and bias in the training data. Moreover, we propose the DebiasFR framework to produce HR faces with high image quality and accurate facial attributes. The key design is to explicitly model the facial attributes, which also allows to adjust facial attributes for the output HR faces. Experiment results show that DebiasFR has comparable image quality but significantly smaller attribute bias when compared with state-of-the-art FR methods.
List of keywords
Computer Vision -> CV: Applications
Computer Vision -> CV: Bias, fairness and privacy
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
3556
Not Only Pairwise Relationships: Fine-Grained Relational Modeling for Multivariate Time Series Forecasting
Jinming Wu, Qi Qi, Jingyu Wang, Haifeng Sun, Zhikang Wu, Zirui Zhuang, Jianxin Liao
[+] More
[-] Less
Recent graph-based methods achieve significant success in multivariate time series modeling and forecasting due to their ability to handle relationships among time series variables. However, only pairwise relationships are considered in most existing works. They ignore beyond-pairwise relationships and their potential categories in practical scenarios, which leads to incomprehensive relationship learning for multivariate time series forecasting. In this paper, we present ReMo, a Relational Modeling-based method, to promote fine-grained relational learning among multivariate time series data. Firstly, by treating time series variables and complex relationships as nodes and hyperedges, we extract multi-view hypergraphs from data to capture beyond-pairwise relationships. Secondly, a novel hypergraph message passing strategy is designed to characterize both nodes and hyperedges by inferring the potential categories of relationships and further distinguishing their impacts on time series variables. By integrating these two modules into the time series forecasting framework, ReMo effectively improves the performance of multivariate time series forecasting. The experimental results on seven commonly used datasets from different domains demonstrate the superiority of our model.
List of keywords
Machine Learning -> ML: Time series and data streams
Data Mining -> DM: Mining graphs
Data Mining -> DM: Mining spatial and/or temporal data
3562
Acoustic NLOS Imaging with Cross Modal Knowledge Distillation
Ui-Hyeon Shin, Seungwoo Jang, Kwang-su Kim
[+] More
[-] Less
Acoustic non-line-of-sight (NLOS) imaging aims to reconstruct hidden scenes by analyzing reflections of acoustic waves. Despite recent developments in the field, existing methods still have limitations such as sensitivity to noise in a physical model and difficulty in reconstructing unseen objects in a deep learning model. To address these limitations, we propose a novel cross-modal knowledge distillation (CMKD) approach for acoustic NLOS imaging. Our method transfers knowledge from a well-trained image network to an audio network, effectively combining the strengths of both modalities. As a result, it is robust to noise and superior in reconstructing unseen objects. Additionally, we evaluate real-world datasets and demonstrate that the proposed method outperforms state-of-the-art methods in acoustic NLOS imaging. The experimental results indicate that CMKD is an effective solution for addressing the limitations of current acoustic NLOS imaging methods.
List of keywords
Computer Vision -> CV: Applications
Computer Vision -> CV: Neural generative models, auto encoders, GANs  
Machine Learning -> ML: Multi-modal learning
3566
GLPocket: A Multi-Scale Representation Learning Approach for Protein Binding Site Prediction
Peiying Li, Yongchang Liu, Shikui Tu, Lei Xu
[+] More
[-] Less
Protein binding site prediction is an important prerequisite for the discovery of new drugs. Usually, natural 3D U-Net is adopted as the standard site prediction framework to do per-voxel binary mask classification. However, this scheme only performs feature extraction for single-scale samples, which may bring the loss of global or local information, resulting in incomplete, artifacted or even missed predictions. To tackle this issue, we propose a network called GLPocket, which is based on the 3D U-Net structure and utilizes multi-scale representation to predict binding sites. Firstly, GLPocket uses Target Cropping Block (TCB) for targeted prediction. TCB selects the local interested representation from the global representations to perform concentrated prediction, and reduces the calculation amount by $82\%$. It integrates global distribution information into local regions, making prediction more concentrated in decoding stage. Secondly, GLPocket establishes long-range relationship of patches within the local region with Transformer Block (TB), to enrich local context semantic information. Experiments show that GLPocket improves by $0.5\%-4\%$ on DCA Top-$n$ prediction compared with previous state-of-the-art methods on four benchmark data sets. We will publish source code in GitHub after the article is accepted.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Bioinformatics
Computer Vision -> CV: Biomedical image analysis
3572
DeepPSL: End-to-end perception and reasoning
Sridhar Dasaratha, Sai Akhil Puranam, Karmvir Phogat, Sunil Tiyyagura, Nigel Duffy
[+] More
[-] Less
We introduce DeepPSL a variant of probabilistic soft logic (PSL) to produce an end-to-end trainable system that integrates reasoning and perception. PSL represents first-order logic in terms of a convex graphical model – hinge-loss Markov random fields (HL-MRFs). PSL stands out among probabilistic logic frameworks due to its tractability having been applied to systems of more than 1 billion ground rules. The key to our approach is to represent predicates in first-order logic using deep neural networks and then to approximately back-propagate through the HL-MRF and thus train every aspect of the first-order system being represented. We believe that this approach represents an interesting direction for the integration of deep learning and reasoning techniques with applications to knowledge base learning, multi-task learning, and explainability. Evaluation on three different tasks demonstrates that DeepPSL significantly outperforms state-of-the-art neuro-symbolic methods on scalability while achieving comparable or better accuracy.
List of keywords
Machine Learning -> ML: Neuro-symbolic methods
Machine Learning -> ML: Knowledge-aided learning
Machine Learning -> ML: Learning graphical models
3573
Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance
Zheyu Zhang, Tianping Zhang, Jian Li
[+] More
[-] Less
Gradient Boosting Decision Tree (GBDT) has achieved remarkable success in a wide variety of applications. The split finding algorithm, which determines the tree construction process, is one of the most crucial components of GBDT. However, the split finding algorithm has long been criticized for its bias towards features with a large number of potential splits. This bias introduces severe interpretability and overfitting issues in GBDT. To this end, we provide a fine-grained analysis of bias in GBDT and demonstrate that the bias originates from 1) the systematic bias in the gain estimation of each split and 2) the bias in the split finding algorithm resulting from the use of the same data to evaluate the split improvement and determine the best split. Based on the analysis, we propose unbiased gain, a new unbiased measurement of gain importance using out-of-bag samples. Moreover, we incorporate the unbiased property into the split finding algorithm and develop UnbiasedGBM to solve the overfitting issue of GBDT. We assess the performance of UnbiasedGBM and unbiased gain in a large-scale empirical study comprising 60 datasets and show that: 1) UnbiasedGBM exhibits better performance than popular GBDT implementations such as LightGBM, XGBoost, and Catboost on average on the 60 datasets and 2) unbiased gain achieves better average performance in feature selection than popular feature importance methods.
List of keywords
Machine Learning -> ML: Applications
Machine Learning -> ML: Classification
3636
Semi-supervised Domain Adaptation in Graph Transfer Learning
Ziyue Qiao, Xiao Luo, Meng Xiao, Hao Dong, Yuanchun Zhou, Hui Xiong
[+] More
[-] Less
As a specific case of graph transfer learning, unsupervised domain adaptation on graphs aims for knowledge transfer from label-rich source graphs to unlabeled target graphs. However, graphs with topology and attributes usually have considerable cross-domain disparity and there are numerous real-world scenarios where merely a subset of nodes are labeled in the source graph. This imposes critical challenges on graph transfer learning due to serious domain shifts and label scarcity. To address these challenges, we propose a method named Semi-supervised Graph Domain Adaptation (SGDA). To deal with the domain shift, we add adaptive shift parameters to each of the source nodes, which are trained in an adversarial manner to align the cross-domain distributions of node embedding. Thus, the node classifier trained on labeled source nodes can be transferred to the target nodes. Moreover, to address the label scarcity, we propose pseudo-labeling on unlabeled nodes, which improves classification on the target graph via measuring the posterior influence of nodes based on their relative position to the class centroids. Finally, extensive experiments on a range of publicly accessible datasets validate the effectiveness of our proposed SGDA in different experimental settings.
List of keywords
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Multi-task and transfer learning
Machine Learning -> ML: Semi-supervised learning
3639
Don’t Ignore Alienation and Marginalization: Correlating Fraud Detection
Yilong Zang, Ruimin Hu, Zheng Wang, Xu Danni, Jia Wu, Dengshi Li, Junhang Wu, Lingfei Ren
[+] More
[-] Less
The anonymity of online networks makes tackling fraud increasingly costly. Thanks to the superiority of graph representation learning, graph-based fraud detection has made significant progress in recent years. However, upgrading fraudulent strategies produces more advanced and difficult scams. One common strategy is synergistic camouflage —— combining multiple means to deceive others. Existing methods mostly investigate the differences between relations on individual frauds, that neglect the correlation among multi-relation fraudulent behaviors. In this paper, we design several statistics to validate the existence of synergistic camouflage of fraudsters by exploring the correlation among multi-relation interactions. From the perspective of multi-relation, we find two distinctive features of fraudulent behaviors, \textit{i.e.}, alienation and marginalization. Based on the finding, we propose COFRAUD, a correlation-aware fraud detection model, which innovatively incorporates synergistic camouflage into fraud detection. It captures the correlation among multi-relation fraudulent behaviors. Experimental results on two public datasets demonstrate that COFRAUD achieves significant improvements over state-of-the-art methods.
List of keywords
Multidisciplinary Topics and Applications -> MDA: Security and privacy
Data Mining -> DM: Applications
3654
Dual Prompt Learning for Continual Rain Removal from Single Images
Minghao Liu, Wenhan Yang, Yuzhang Hu, Jiaying Liu
[+] More
[-] Less
Recent efforts have achieved remarkable progress on single image deraining on the stationary distributed data. However, catastrophic forgetting raises practical concerns when applying these methods to real applications, where the data distributions change constantly. In this paper, we investigate the continual learning issue for rain removal and develop a novel efficient continual learned deraining transformer. Different from the typical replay or regularization-based methods that increase overall training time or parameter space, our method relies on compact prompts which are learnable parameters, to maintain both task-invariant and task-specific knowledge. Our prompts are applied at both image and feature levels to leverage effectively transferred knowledge of images and features among different tasks. We conduct comprehensive experiments under widely-used rain removal datasets, where our proposed dual prompt learning consistently outperforms prior state-of-the-art methods. Moreover, we observe that, even though our method is designed for continual learning, it still achieves superior results on the stationary distributed data, which further demonstrates the effectiveness of our method.
List of keywords
Computer Vision -> CV: Computational photography
3655
Open-world Semi-supervised Novel Class Discovery
Jiaming Liu, Yangqiming Wang, Tongze Zhang, Yulu Fan, Qinli Yang, Junming Shao
[+] More
[-] Less
Traditional semi-supervised learning tasks assume that both labeled and unlabeled data follow the same class distribution, but the realistic open-world scenarios are of more complexity with unknown novel classes mixed in the unlabeled set. Therefore, it is of great challenge to not only recognize samples from known classes but also discover the arbitrary number of novel classes from the unlabeled data. In this paper, we introduce a new open-world semi-supervised novel class discovery approach named OpenNCD, a progressive bi-level contrastive learning method over multiple prototypes. The proposed method is composed of two reciprocally enhanced parts. First, a bi-level contrastive learning method is introduced, which maintains the pair-wise similarity of the prototypes and the prototype group levels for better representation learning. Then, a reliable prototype similarity metric is proposed based on the common representing instances. Prototypes with high similarities are grouped progressively for known class matching and novel class discovery. Extensive experiments on three image datasets are conducted and the results show the effectiveness of the proposed method in open-world scenarios, especially with fewer known classes and labels.
List of keywords
Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Self-supervised Learning
3663
Diverse Approximations for Monotone Submodular Maximization Problems with a Matroid Constraint
Anh Do, Mingyu Guo, Aneta Neumann, Frank Neumann
[+] More
[-] Less
Finding diverse solutions to optimization problems has been of practical interest for several decades, and recently enjoyed increasing attention in research. While submodular optimization has been rigorously studied in many fields, its diverse solutions extension has not. In this study, we consider the most basic variants of submodular optimization, and propose two simple greedy algorithms, which are known to be effective at maximizing monotone submodular functions. These are equipped with parameters that control the trade-off between objective and diversity. Our theoretical contribution shows their approximation guarantees in both objective value and diversity, as functions of their respective parameters. Our experimental investigation with maximum coverage instances demonstrates their empirical differences in terms of objective-diversity trade-offs.
List of keywords
Search -> S: Combinatorial search and optimisation
Search -> S: Heuristic search