Accepted Papers List
11
StockFormer: Learning Hybrid Trading Machines with Predictive Coding
[+] More 
[-] Less 
Typical RL-for-finance solutions directly optimize trading policies over the noisy market data, such as stock prices and trading volumes, without explicitly considering the future trends and correlations of different investment assets as we humans do. In this paper, we present StockFormer, a hybrid trading machine that integrates the forward modeling capabilities of predictive coding with the advantages of RL agents in policy flexibility. The predictive coding part consists of three Transformer branches with modified structures, which respectively extract effective latent states of long-/short-term future dynamics and asset relations. The RL agent adaptively fuses these states and then executes an actor-critic algorithm in the unified state space. The entire model is jointly trained by propagating the critic’s gradients back to the predictive coding module. StockFormer significantly outperforms existing approaches across three publicly available financial datasets in terms of portfolio returns and Sharpe ratios.
Machine Learning -> ML: Deep reinforcement learning
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Finance Machine Learning -> ML: Deep reinforcement learning
40
PathLAD+: An Improved Exact Algorithm for Subgraph Isomorphism Problem
[+] More 
[-] Less 
The subgraph isomorphism problem (SIP) is a challenging problem with wide practical applications. In the last decade, despite being a theoretical hard problem, researchers design various algorithms for solving SIP. In this work, we propose three main heuristics and develop an improved exact algorithm for SIP. First, we design a probing search procedure to try whether the search procedure can successfully obtain a solution at first sight. Second, we design a novel matching ordering as a value-ordering heuristic, which uses some useful information obtained from the probing search procedure to preferentially select some promising target vertices. Third, we discuss the characteristics of different propagation methods in the context of SIP and present an adaptive propagation method to make a good balance between these methods. Experimental results on a broad range of real-world benchmarks show that our proposed algorithm performs better than state-of-the-art algorithms for the SIP.
Search -> S: Heuristic search
List of keywords 
Search -> S: Combinatorial search and optimisation Search -> S: Heuristic search
53
Non-Obvious Manipulability in Extensive-Form Mechanisms: The Revelation Principle for Single-Parameter Agents
[+] More 
[-] Less 
Recent work in algorithmic mechanism design focuses on designing mechanisms for agents with bounded rationality, modifying the constraints that must be satisfied in order to achieve incentive compatibility.  Starting with Li’s strengthening of strategyproofness, obvious strategyproofness (OSP) requires truthtelling to be "obvious" over dishonesty, roughly meaning that the worst outcome from truthful actions must be no worse than the best outcome for dishonest ones. A celebrated result for dominant-strategy incentive-compatible mechanisms that allows us to restrict attention to direct mechanisms, known as the revelation principle, does not hold for OSP: the implementation details matter for the obvious incentive properties of the mechanism. Studying agent strategies in real-life mechanisms, Troyan and Morrill introduce a relaxation of strategyproofness known as non-obvious manipulability, which only requires comparing certain extrema of the agents’ utility functions in order for a mechanism to be incentive-compatible. Specifically a mechanism is not obviously manipulable (NOM) if the best and worst outcomes when acting truthfully are no worse than the best and worst outcomes when acting dishonestly. In this work we first extend the cycle monotonicity framework for direct-revelation NOM mechanism design to indirect mechanisms. We then apply this to two settings, single-parameter agents and mechanisms for two agents in which one has a two-value domain, and show that under these models the revelation principle holds: direct mechanisms are just as powerful as indirect ones.
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Multidisciplinary Topics and Applications -> MDA: Economics
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Mechanism design Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Multidisciplinary Topics and Applications -> MDA: Economics
58
HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE
[+] More 
[-] Less 
Factor model is a fundamental investment tool in quantitative investment, which can be empowered by deep learning to become more flexible and efficient in practical complicated investing situations. However, it is still an open question to build a factor model that can conduct stock prediction in an online and adaptive setting, where the model can adapt itself to match the current market regime identified based on only point-in-time market information. To tackle this problem, we propose the first deep learning based online and adaptive factor model, HireVAE, at the core of which is a hierarchical latent space that embeds the underlying relationship between the market situation and stock-wise latent factors, so that HireVAE can effectively estimate useful latent factors given only historical market information and subsequently predict accurate stock returns. Across four commonly used real stock market benchmarks, the proposed HireVAE demonstrate superior performance in terms of active returns over previous methods, verifying the potential of such online and adaptive factor model.
Machine Learning -> ML: Applications
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Finance Machine Learning -> ML: Applications
71
Teaching What You Should Teach: A Data-Based Distillation Method
[+] More 
[-] Less 
In real teaching scenarios, an excellent teacher always teaches what he (or she) is good at but the student is not. This gives the student the best assistance in making up for his (or her) weaknesses and becoming a good one overall. Enlightened by this, we introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework, and propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally. To be specific, we design a neural network-based data augmentation module with priori bias to find out what meets the teacher’s strengths but the student’s weaknesses, by learning magnitudes and probabilities to generate suitable data samples. By training the data augmentation module and the generalized distillation paradigm alternately, a student model is learned with excellent generalization ability. To verify the effectiveness of our method, we conducted extensive comparative experiments on object recognition, detection, and segmentation tasks. The results on the CIFAR-100, ImageNet-1k, MS-COCO, and Cityscapes datasets demonstrate that our method achieves state-of-the-art performance on almost all teacher-student pairs. Furthermore, we conduct visualization studies to explore what magnitudes and probabilities are needed for the distillation process.
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Segmentation
List of keywords 
Computer Vision -> CV: Applications Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Segmentation
84
Adversarial Amendment is the Only Force Capable of Transforming an Enemy into a Friend
[+] More 
[-] Less 
Adversarial attack is commonly regarded as a huge threat to neural networks because of misleading behavior. This paper presents an opposite perspective: adversarial attacks can be harnessed to improve neural models if amended correctly. Unlike traditional adversarial defense or adversarial training schemes that aim to improve the adversarial robustness, the proposed adversarial amendment (AdvAmd) method aims to improve the original accuracy level of neural models on benign samples. We thoroughly analyze the distribution mismatch between the benign and adversarial samples. This distribution mismatch and the mutual learning mechanism with the same learning ratio applied in prior art defense strategies is the main cause leading the accuracy degradation for benign samples. The proposed AdvAmd is demonstrated to steadily heal the accuracy degradation and even leads to a certain accuracy boost of common neural models on benign classification, object detection, and segmentation tasks. The efficacy of the AdvAmd is contributed by three key components: mediate samples (to reduce the influence of distribution mismatch with a fine-grained amendment), auxiliary batch norm (to solve the mutual learning mechanism and the smoother judgment surface), and AdvAmd loss (to adjust the learning ratios according to different attack vulnerabilities) through quantitative and ablation experiments.
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Constraint Satisfaction and Optimization -> CSO: Applications
List of keywords 
Machine Learning -> ML: Robustness AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Constraint Satisfaction and Optimization -> CSO: Applications
86
TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition
[+] More 
[-] Less 
Text irregularities pose significant challenges to scene text recognizers. Thin-Plate Spline (TPS)-based rectification is widely regarded as an effective means to deal with them. Currently, the calculation of TPS transformation parameters purely depends on the quality of regressed text borders. It ignores the text content and often leads to unsatisfactory rectified results for severely distorted text. In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time. TPS++ formulates the parameter calculation as a joint process of foreground control point regression and content-based attention score estimation, which is computed by a dedicated designed gated-attention block. TPS++ builds a more flexible content-aware rectifier, generating a natural text correction that is easier to read by the subsequent recognizer. Moreover, TPS++ shares the feature backbone with the recognizer in part and implements the rectification at feature-level rather than image-level, incurring only a small overhead in terms of parameters and inference time. Experiments on public benchmarks show that TPS++ consistently improves the recognition and achieves state-of-the-art accuracy. Meanwhile, it generalizes well on different backbones and recognizers. Code is at https://github.com/simplify23/TPS_PP.
Computer Vision -> CV: Scene analysis and understanding
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Scene analysis and understanding
87
Self-supervised Graph Disentangled Networks for Review-based Recommendation
[+] More 
[-] Less 
User review data is considered as auxiliary information to alleviate the data sparsity problem and improve the quality of learned user/item or interaction representations in review-based recommender systems. However, existing methods usually model user-item interactions in a holistic manner and neglect the entanglement of the latent intents behind them, e.g., price, quality, or appearance, resulting in suboptimal representations and reducing interpretability. In this paper, we propose a Self-supervised Graph Disentangled Networks for review-based recommendation (SGDN), to separately model the user-item interactions based on the latent factors through the textual review data. To this end, we first model the distributions of interactions over latent factors from both semantic information in review data and structural information in user-item graph data, forming several factor graphs. Then a factorized message passing mechanism is designed to learn disentangled user/item and interaction representations on the factor graphs. Finally, we set an intent-aware contrastive learning task to alleviate the sparsity issue and encourage disentanglement through dynamically identifying positive and negative samples based on the learned intent distributions. Empirical results over five benchmark datasets validate the superiority of  SGDN over the state-of-the-art methods and the  interpretability of learned intent factors.
Data Mining -> DM: Collaborative filtering
List of keywords 
Data Mining -> DM: Recommender systems Data Mining -> DM: Collaborative filtering
100
Understanding the Generalization Ability of Deep Learning Algorithms: A Kernelized Rényi’s Entropy Perspective
[+] More 
[-] Less 
Recently, information-theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient / Langevin descent (SGD/SGLD) learning algorithms without strong assumptions such as Lipschitz or convexity conditions. However, the current generalization error bounds within this framework are still far from optimal, while substantial improvements on these bounds are quite challenging due to the intractability of high-dimensional information quantities. To address this issue,  we first propose a novel information theoretical measure: kernelized Rényi’s entropy, by utilizing operator representation in Hilbert space. It inherits the properties of Shannon’s entropy and can be effectively calculated via simple random sampling, while remaining independent of the input dimension. We then establish the generalization error bounds for SGD/SGLD under kernelized Rényi’s entropy, where the mutual information quantities can be directly calculated, enabling evaluation of the tightness of each intermediate step. We show that our information-theoretical bounds depend on the statistics of the stochastic gradients evaluated along with the iterates, and are rigorously tighter than the current state-of-the-art (SOTA) results. The theoretical findings are also supported by large-scale empirical studies.
Machine Learning -> ML: Learning theory
List of keywords 
Machine Learning -> ML: Theory of deep learning Machine Learning -> ML: Learning theory
108
A Regular Matching Constraint for String Variables
[+] More 
[-] Less 
Using a regular language as a pattern for string matching is nowadays a common -and sometimes unsafe- operation, provided as a built-in feature by most programming languages. A proper constraint solver over string variables should support most of the operations over regular expressions and related constructs. However, state-of-the-art string solvers natively support only the membership relation of a string variable to a regular language. Here we take a step forward by defining a specialised propagator for the match operation, returning the leftmost position where a pattern can match a given string. Empirical evidences show the effectiveness of our approach, implemented within the constraint programming framework, and tested against state-of-the-art string solvers.
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint programming Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
109
Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning
[+] More 
[-] Less 
Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation. In each turn of CRS, there naturally have two decision-making processes with different roles that influence each other: 1) director, which is to select the follow-up option (i.e., ask or recommend) that is more effective for reducing the action space and acquiring user preferences; and 2) actor, which is to accordingly choose primitive actions (i.e., asked attribute or recommended item) to estimate the effectiveness of the director’s option. However, existing methods heavily rely on a unified decision-making module or heuristic rules, while neglecting to distinguish the roles of different decision procedures, as well as the mutual influences between them. To address this, we propose a novel Director-Actor Hierarchical Conversational Recommender (DAHCR), where the director selects the most effective option, followed by the actor accordingly choosing primitive actions that satisfy user preferences. Specifically, we develop a dynamic hypergraph to model user preferences and introduce an intrinsic motivation to train from weak supervision over the director. Finally, to alleviate the bad effect of model bias on the mutual influence between the director and actor, we model the director’s option by sampling from a categorical distribution. Extensive experiments demonstrate that DAHCR outperforms state-of-the-art methods.
List of keywords 
Data Mining -> DM: Recommender systems 133
Timestamp-Supervised Action Segmentation from the Perspective of Clustering
[+] More 
[-] Less 
Video action segmentation under timestamp supervision has recently received much attention due to lower annotation costs. Most existing methods generate pseudo-labels for all frames in each video to train the segmentation model. However, these methods suffer from incorrect pseudo-labels, especially for the semantically unclear frames in the transition region between two consecutive actions, which we call ambiguous intervals. To address this issue, we propose a novel framework from the perspective of clustering, which includes the following two parts. First, pseudo-label ensembling generates incomplete but high-quality pseudo-label sequences, where the frames in ambiguous intervals have no pseudo-labels. Second, iterative clustering iteratively propagates the pseudo-labels to the ambiguous intervals by clustering, and thus updates the pseudo-label sequences to train the model. We further introduce a clustering loss, which encourages the features of frames within the same action segment more compact. Extensive experiments show the effectiveness of our method.
Computer Vision -> CV: Applications
Computer Vision -> CV: Machine learning for vision
List of keywords 
Computer Vision -> CV: Video analysis and understanding    Computer Vision -> CV: Applications
Computer Vision -> CV: Machine learning for vision
162
Decoupling with Entropy-based Equalization for Semi-Supervised Semantic Segmentation
[+] More 
[-] Less 
Semi-supervised semantic segmentation methods are the main solution to alleviate the problem of high annotation consumption in semantic segmentation. However, the class imbalance problem makes the model favor the head classes with sufficient training samples, resulting in poor performance of the tail classes. To address this issue, we propose a Decoupled Semi-Supervise Semantic Segmentation (DeS4) framework based on the teacher-student model. Specifically, we first propose a decoupling training strategy to split the training of the encoder and segmentation decoder, aiming at a balanced decoder. Then, a non-learnable prototype-based segmentation head is proposed to regularize the category representation distribution consistency and perform a better connection between the teacher model and the student model. Furthermore, a Multi-Entropy Sampling (MES) strategy is proposed to collect pixel representation for updating the shared prototype to get a class-unbiased head. We conduct extensive experiments of the proposed DeS4 on two challenging benchmarks (PASCAL VOC 2012 and Cityscapes) and achieve remarkable improvements over the previous state-of-the-art methods.
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Segmentation
List of keywords 
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning    Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Segmentation
169
MM-PCQA: Multi-Modal Learning for No-reference Point Cloud Quality Assessment
[+] More 
[-] Less 
The visual quality of point clouds has been greatly emphasized since the ever-increasing 3D vision applications are expected to provide cost-effective and high-quality experiences for users.  Looking back on the development of point cloud quality assessment (PCQA), the visual quality is usually evaluated by utilizing single-modal information, i.e., either extracted from the 2D projections or 3D point cloud. The 2D projections contain rich texture and semantic information but are highly dependent on viewpoints, while the 3D point clouds are more sensitive to geometry distortions and invariant to viewpoints. Therefore, to leverage the advantages of both point cloud and projected image modalities, we propose a novel no-reference Multi-Modal Point Cloud Quality Assessment (MM-PCQA) metric. In specific, we split the point clouds into sub-models to represent local geometry distortions such as point shift and down-sampling. Then we render the point clouds into 2D image projections for texture feature extraction. To achieve the goals, the sub-models and projected images are encoded with point-based and image-based neural networks. Finally, symmetric cross-modal attention is employed to fuse multi-modal quality-aware information. Experimental results show that our approach outperforms all compared state-of-the-art methods and is far ahead of previous no-reference PCQA methods, which highlights the effectiveness of the proposed method. The code is available at https://github.com/zzc-1998/MM-PCQA.
Machine Learning -> ML: Multi-modal learning
List of keywords 
Computer Vision -> CV: 3D computer vision Machine Learning -> ML: Multi-modal learning
174
Scalable Communication for Multi-Agent Reinforcement Learning via Transformer-Based Email Mechanism
[+] More 
[-] Less 
Communication can impressively improve cooperation in multi-agent reinforcement learning (MARL), especially for partially-observed tasks. However, existing works either broadcast the messages leading to information redundancy, or learn targeted communication by modeling all the other agents as targets, which is not scalable when the number of agents varies. In this work, to tackle the scalability problem of MARL communication for partially-observed tasks, we propose a novel framework Transformer-based Email Mechanism (TEM). The agents adopt local communication to send messages only to the ones that can be observed without modeling all the agents. Inspired by human cooperation with email forwarding, we design message chains to forward information to cooperate with the agents outside the observation range. We introduce Transformer to encode and decode the message chain to choose the next receiver selectively. Empirically, TEM outperforms the baselines on multiple cooperative MARL benchmarks. When the number of agents varies, TEM maintains superior performance without further training.
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Machine Learning -> ML: Deep reinforcement learning
Robotics -> ROB: Multi-robot systems
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Agent communication Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Machine Learning -> ML: Deep reinforcement learning
Robotics -> ROB: Multi-robot systems
193
SF-PATE: Scalable, Fair, and Private Aggregation of Teacher Ensembles
[+] More 
[-] Less 
A critical concern in data-driven processes is to build models whose outcomes do not discriminate against some protected groups. In learning tasks, knowledge of the group attributes is essential to ensure non-discrimination, but in practice, these attributes may not be available due to legal and ethical requirements. To address this challenge, this paper studies a model that protects the privacy of individuals’ sensitive information while also allowing it to learn non-discriminatory predictors.
A key feature of the proposed model is to enable the use of off-the-shelves and non-private fair models to create a privacy-preserving and fair model. The paper analyzes the relation between accuracy, privacy, and fairness, and assesses the benefits of the proposed models on several prediction tasks. In particular, this proposal allows both scalable and accurate training of private and fair models for very large neural networks.
Data Mining -> DM: Privacy-preserving data mining
Machine Learning -> ML: Multi-task and transfer learning
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI Data Mining -> DM: Privacy-preserving data mining
Machine Learning -> ML: Multi-task and transfer learning
194
MILD: Modeling the Instance Learning Dynamics for Learning with Noisy Labels
[+] More 
[-] Less 
Despite deep learning has achieved great success, it often relies on a large amount of training data with accurate labels, which are expensive and time-consuming to collect. A prominent direction to reduce the cost is to learn with noisy labels, which are ubiquitous in the real-world applications. A critical challenge for such a learning task is to reduce the effect of network memorization on the falsely-labeled data.  In this work, we propose an iterative selection approach based on the Weibull mixture model, which identifies clean data by considering the overall learning dynamics of each data instance. In contrast to the previous small-loss heuristics, we leverage the observation that deep network is easy to memorize and hard to forget clean data. In particular, we measure the difficulty of memorization and forgetting for each instance via the transition times between being misclassified and being memorized in training, and integrate them into a novel metric for selection. Based on the proposed metric, we retain a subset of identified clean data and repeat the selection procedure to iteratively refine the clean subset, which is finally used for model training. To validate our method, we perform extensive experiments on synthetic noisy datasets and real-world web data, and our strategy outperforms existing noisy-label learning methods.
Computer Vision -> CV: Machine learning for vision
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Machine learning for vision
199
Approximate Envy-Freeness in Graphical Cake Cutting
[+] More 
[-] Less 
We study the problem of fairly allocating a divisible resource in the form of a graph, also known as graphical cake cutting. Unlike for the canonical interval cake, a connected envy-free allocation is not guaranteed to exist for a graphical cake. We focus on the existence and computation of connected allocations with low envy. For general graphs, we show that there is always a 1/2-additive-envy-free allocation and, if the agents’ valuations are identical, a (2+\epsilon)-multiplicative-envy-free allocation for any \epsilon > 0. In the case of star graphs, we obtain a multiplicative factor of 3+\epsilon for arbitrary valuations and 2 for identical valuations. We also derive guarantees when each agent can receive more than one connected piece. All of our results come with efficient algorithms for computing the respective allocations.
Game Theory and Economic Paradigms -> GTEP: Computational social choice
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division Game Theory and Economic Paradigms -> GTEP: Computational social choice
200
LISSNAS: Locality-based Iterative Search Space Shrinkage for Neural Architecture Search
[+] More 
[-] Less 
Search spaces hallmark the advancement of Neural Architecture Search (NAS). Large and complex search spaces with versatile building operators and structures provide more opportunities to brew promising architectures, yet pose severe challenges on efficient exploration and exploitation. Subsequently, several search space shrinkage methods optimize by selecting a single sub-region that contains some well-performing networks.  Small performance and efficiency gains are observed with these methods but such techniques leave room for significantly improved search performance and are ineffective at retaining architectural diversity. We propose LISSNAS, an automated algorithm that shrinks a large space into a diverse, small search space with SOTA search performance. Our approach leverages locality, the relationship between structural and performance similarity, to efficiently extract many pockets of well-performing networks. We showcase our method on an array of search spaces spanning various sizes and datasets. We accentuate the effectiveness of our shrunk spaces when used in one-shot search by achieving the best Top-1 accuracy in two different search spaces. Our method achieves a SOTA Top-1 accuracy of 77.6% in ImageNet under mobile constraints, best-in-class Kendal-Tau, architectural diversity, and search space size.
Machine Learning -> ML: Automated machine learning
List of keywords 
Computer Vision -> CV: Machine learning for vision Machine Learning -> ML: Automated machine learning
203
FedSampling: A Better Sampling Strategy for Federated Learning
[+] More 
[-] Less 
Federated learning (FL) is an important technique for learning models from decentralized data in a privacy-preserving way. Existing FL methods usually uniformly sample clients for local model learning in each round. However, different clients may have significantly different data sizes, and the clients with more data cannot have more opportunities to contribute to model training, which may lead to inferior performance. In this paper, instead of client uniform sampling, we propose a novel data uniform sampling strategy for federated learning (FedSampling), which can effectively improve the performance of federated learning especially when client data size distribution is highly imbalanced across clients. In each federated learning round, local data on each client is randomly sampled for local model learning according to a probability based on the server desired sample size and the total sample size on all available clients. Since the data size on each client is privacy-sensitive, we propose a privacy-preserving way to estimate the total sample size with a differential privacy guarantee. Experiments on four benchmark datasets show that FedSampling can effectively improve the performance of federated learning.
Data Mining -> DM: Privacy-preserving data mining
List of keywords 
Machine Learning -> ML: Federated learning Data Mining -> DM: Privacy-preserving data mining
204
A Rigorous Risk-aware Linear Approach to Extended Markov Ratio Decision Processes with Embedded Learning
[+] More 
[-] Less 
We consider the problem of risk-aware Markov Decision Processes (MDPs) for Safe AI. We introduce a theoretical framework, Extended Markov Ratio Decision Processes (EMRDP), that incorporates risk into MDPs and embeds environment learning into this framework. We propose an algorithm to find the optimal policy for EMRDP with theoretical guarantees. Under a certain monotonicity assumption, this algorithm runs in strongly-polynomial time both in the discounted and expected average reward models. We validate our algorithm empirically on a Grid World benchmark, evaluating its solution quality, required number of steps, and numerical stability. We find its solution quality to be stable under data noising, while its required number of steps grows with added noise. We observe its numerical stability compared to global methods.
Machine Learning -> ML: Reinforcement learning
Uncertainty in AI -> UAI: Sequential decision making
List of keywords 
Planning and Scheduling -> PS: Markov decisions processes Machine Learning -> ML: Reinforcement learning
Uncertainty in AI -> UAI: Sequential decision making
222
Shhh! The Logic of Clandestine Operations
[+] More 
[-] Less 
An operation is called covert if it conceals the identity of the actor; it is called clandestine if the very fact that the operation is conducted is concealed. The paper proposes a formal semantics of clandestine operations and introduces a sound and complete logical system that describes the interplay between the distributed knowledge modality and a modality capturing coalition power to conduct clandestine operations.
Knowledge Representation and Reasoning -> KRR: Reasoning about actions
List of keywords 
Knowledge Representation and Reasoning -> KRR: Reasoning about knowledge and belief Knowledge Representation and Reasoning -> KRR: Reasoning about actions
223
Appearance Prompt Vision Transformer for Connectome Reconstruction
[+] More 
[-] Less 
Neural connectivity reconstruction aims to understand the function of biological reconstruction and promote basic scientific research. The intricate morphology and densely intertwined branches make it an extremely challenging task. Most previous best-performing methods adopt affinity learning or metric learning. Nevertheless, they either neglect to model explicit voxel semantics caused by implicit optimization or are hysteresis to spatial information. Furthermore, the inherent locality of 3D CNNs limits modeling long-range dependencies, leading to sub-optimal results. In this work, we propose a coherent and unified Appearance Prompt Vision Transformer (APViT) to integrate affinity and metric learning to exploit the complementarity by learning long-range spatial dependencies. The proposed APViT enjoys several merits. First, the extension continuity-aware attention module aims at constructing hierarchical attention customized for neuron extensibility and slice continuity to learn instance voxel semantic context from a global perspective and utilize continuity priors to enhance voxel spatial awareness. Second, the appearance prompt modulator is responsible for leveraging voxel-adaptive appearance knowledge conditioned on affinity rich in spatial information to instruct instance voxel semantics, exploiting the potential of affinity learning to complement metric learning. Extensive experimental results on multiple challenging benchmarks demonstrate that our APViT achieves consistent improvements with huge flexibility under the same post-processing strategy.
Computer Vision -> CV: Segmentation
List of keywords 
Computer Vision -> CV: Biomedical image analysis Computer Vision -> CV: Segmentation
228
Adversarial Behavior Exclusion for Safe Reinforcement Learning
[+] More 
[-] Less 
Learning by exploration makes reinforcement learning (RL) potentially attractive for many real-world applications. However, this learning process makes RL inherently too vulnerable to be used in real-world applications where safety is of utmost importance. Most prior studies consider exploration at odds with safety and thereby restrict it using either joint optimization of task and safety or imposing constraints for safe exploration. This paper migrates from the current convention to using exploration as a key to safety by learning safety as a robust behavior that completely excludes any behavioral pattern responsible for safety violations. Adversarial Behavior Exclusion for Safe RL (AdvEx-RL) learns a behavioral representation of the agent’s safety violations by approximating an optimal adversary utilizing exploration and later uses this representation to learn a separate safety policy that excludes those unsafe behaviors. In addition, AdvEx-RL ensures safety in a task-agnostic manner by acting as a safety firewall and therefore can be integrated with any RL task policy. We demonstrate the robustness of AdvEx-RL via comprehensive experiments in standard constrained Markov decision processes (CMDP) environments under 2 white-box action space perturbations as well as with changes in environment dynamics against 7 baselines. Consistently, AdvEx-RL outperforms the baselines by achieving an average safety performance of over 75% in the continuous action space with 10 times more variations in the testing environment dynamics. By using a standalone safety policy independent of conflicting objectives, AdvEx-RL also paves the way for interpretable safety behavior analysis as we show in our user study.
Machine Learning -> ML: Reinforcement learning
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Safety and robustness Machine Learning -> ML: Reinforcement learning
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
243
A Solution to Co-occurence Bias: Attributes Disentanglement via Mutual Information Minimization for Pedestrian Attribute Recognition
[+] More 
[-] Less 
Recent studies on pedestrian attribute recognition progress with either explicit or implicit modeling of the co-occurence among attributes. Considering that this known a prior is highly variable and unforeseeable regarding the specific scenarios, we show that current methods can actually suffer in generalizing such fitted attributes interdependencies onto scenes or identities off the dataset distribution, resulting in the underlined bias of attributes co-occurence. To render models robust in realistic scenes, we propose the attributes-disentangled feature learning to ensure the recognition of an attribute not inferring on the existence of others, and which is sequentially formulated as a problem of mutual information minimization. Rooting from it, practical strategies are devised to efficiently decouple attributes, which substantially improve the baseline and establish state-of-the-art performance on realistic datasets like PETAzs and RAPzs.
Computer Vision -> CV: Bias, fairness and privacy
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Bias, fairness and privacy
251
Guided Patch-Grouping Wavelet Transformer with Spatial Congruence for Ultra-High Resolution Segmentation
[+] More 
[-] Less 
Most existing ultra-high resolution (UHR) segmentation methods always struggle in the dilemma of balancing memory cost and local characterization accuracy, which are both taken into account in our proposed Guided Patch-Grouping Wavelet Transformer (GPWFormer) that achieves impressive performances. In this work, GPWFormer is a Transformer (T)-CNN (C) mutual leaning framework, where T takes the whole UHR image as input and harvests both local details and fine-grained long-range contextual dependencies, while C takes downsampled image as input for learning the category-wise deep context. For the sake of high inference speed and low computation complexity, T partitions the original UHR image into patches and groups them dynamically, then learns the low-level local details with the lightweight multi-head Wavelet Transformer (WFormer) network. Meanwhile, the fine-grained long-range contextual dependencies are also captured during this process, since patches that are far away in the spatial domain can also be assigned to the same group. In addition, masks produced by C are utilized to guide the patch grouping process, providing a heuristics decision. Moreover, the congruence constraints between the two branches are also exploited to maintain the spatial consistency among the patches. Overall, we stack the multi-stage process in a pyramid way. Experiments show that GPWFormer outperforms the existing methods with significant improvements on five benchmark datasets.
Computer Vision -> CV: Segmentation
List of keywords 
Computer Vision -> CV: Scene analysis and understanding    Computer Vision -> CV: Segmentation
256
A Noisy-Label-Learning Formulation for Immune Repertoire Classification and Disease-Associated Immune Receptor Sequence Identification
[+] More 
[-] Less 
Immune repertoire classification, a typical multiple instance learning (MIL) problem, is a frontier research topic in computational biology that makes transformative contributions to new vaccines and immune therapies. However, the traditional instance-space MIL, directly assigning bag-level labels to instances, suffers from the massive amount of noisy labels and extremely low witness rate. In this work, we propose a noisy-label-learning formulation to solve the immune repertoire classification task. To remedy the inaccurate supervision of repertoire-level labels for a sequence-level classifier, we design a robust training strategy: The initial labels are smoothed to be asymmetric and are progressively corrected using the model’s predictions throughout the training process. Furthermore, two models with the same architecture but different parameter initialization are co-trained simultaneously to remedy the known “confirmation bias” problem in the self-training-like schema. As a result, we obtain accurate sequence-level classification and, subsequently, repertoire-level classification. Experiments on the Cytomegalovirus (CMV) and Cancer datasets demonstrate our method’s effectiveness and superior performance on sequence-level and repertoire-level tasks. Code available at https://github.com/TencentAILabHealthcare/NLL-IRC.
Multidisciplinary Topics and Applications -> MDA: Health and medicine
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Bioinformatics Multidisciplinary Topics and Applications -> MDA: Health and medicine
260
Generalization Bounds for Adversarial Metric Learning
[+] More 
[-] Less 
Recently, adversarial metric learning has been proposed to enhance the robustness of the learned distance metric against adversarial perturbations. Despite rapid progress in validating its effectiveness empirically, theoretical guarantees on adversarial robustness and generalization are far less understood. To fill this gap, this paper focuses on unveiling the generalization properties of adversarial metric learning by developing the uniform convergence analysis techniques. Based on the capacity estimation of covering numbers, we establish the first high-probability generalization bounds with order O(n^{-1/2}) for adversarial metric learning with pairwise perturbations and general losses, where n is the number of training samples. Moreover, we obtain the refined generalization bounds with order O(n^{-1}) for the smooth loss by using local Rademacher complexity, which is faster than the previous result of adversarial pairwise learning, e.g., adversarial bipartite ranking. Experimental evaluation on real-world datasets validates our theoretical findings.
Machine Learning -> ML: Learning theory
List of keywords 
Machine Learning -> ML: Adversarial machine learning Machine Learning -> ML: Learning theory
263
Improving LaCAM for Scalable Eventually Optimal Multi-Agent Pathfinding
[+] More 
[-] Less 
This study extends the recently-developed LaCAM algorithm for multi-agent pathfinding (MAPF). LaCAM is a sub-optimal search-based algorithm that uses lazy successor generation to dramatically reduce the planning effort. We present two enhancements. First, we propose its anytime version, called LaCAM*, which eventually converges to optima, provided that solution costs are accumulated transition costs. Second, we improve the successor generation to quickly obtain initial solutions. Exhaustive experiments demonstrate their utility. For instance, LaCAM* sub-optimally solved 99% of the instances retrieved from the MAPF benchmark, where the number of agents varied up to a thousand, within ten seconds on a standard desktop PC, while ensuring eventual convergence to optima; developing a new horizon of MAPF algorithms.
Planning and Scheduling -> PS: Distributed and multi-agent planning
Robotics -> ROB: Motion and path planning
Planning and Scheduling -> PS: Planning algorithms
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning Planning and Scheduling -> PS: Distributed and multi-agent planning
Robotics -> ROB: Motion and path planning
Planning and Scheduling -> PS: Planning algorithms
268
NerCo: A Contrastive Learning Based Two-Stage Chinese NER Method
[+] More 
[-] Less 
Sequence labeling serves as the most commonly used scheme for Chinese named entity recognition(NER). However, traditional sequence labeling methods classify tokens within an entity into different classes according to their positions. As a result, different tokens in the same entity may be learned with representations that are isolated and unrelated in target representation space, which could finally negatively affect the subsequent performance of token classification. In this paper, we point out and define this problem as Entity Representation Segmentation in Label-semantics. And then we present NerCo: Named entity recognition with Contrastive learning, a novel NER framework which can better exploit labeled data and avoid the above problem. Following the pretrain-finetune paradigm, NerCo firstly guides the encoder to learn powerful label-semantics based representations by gathering the encoded token representations of the same Semantic Class while pushing apart that of different. Subsequently, NerCo finetunes the learned encoder for final entity prediction. Extensive experiments on several datasets demonstrate that our framework can consistently improve the baseline and achieve state-of-the-art performance.
Natural Language Processing -> NLP: Named entities
Natural Language Processing -> NLP: Tagging, chunking, and parsing
List of keywords 
Natural Language Processing -> NLP: Information extraction Natural Language Processing -> NLP: Named entities
Natural Language Processing -> NLP: Tagging, chunking, and parsing
283
A Canonicalization-Enhanced Known Fact-Aware Framework For Open Knowledge Graph Link Prediction
[+] More 
[-] Less 
Open knowledge graph (OpenKG) link prediction aims to predict missing factual triples in the form of (head noun phrase, relation phrase, tail noun phrase). Since triples are not canonicalized, previous methods either focus on canonicalizing noun phrases (NPs) to reduce graph sparsity, or utilize textual forms to improve type compatibility. However, they neglect to canonicalize relation phrases (RPs) and triples, making OpenKG maintain high sparsity and impeding the performance. To address the above issues, we propose a Canonicalization-Enhanced Known Fact-Aware (CEKFA) framework that boosts link prediction performance through sparsity reduction of RPs and triples. First, we propose a similarity-driven RP canonicalization method to reduce RPs’ sparsity by sharing knowledge of semantically similar ones. Second, to reduce the sparsity of triples, a known fact-aware triple canonicalization method is designed to retrieve relevant known facts from training data. Finally, these two types of canonical information are integrated into a general two-stage re-ranking framework that can be applied to most existing knowledge graph embedding methods. Experiment results on two OpenKG datasets, ReVerb20K and ReVerb45K, show that our approach achieves state-of-the-art results. Extensive experimental analyses illustrate the effectiveness and generalization ability of the proposed framework.
Data Mining -> DM: Information retrieval
Natural Language Processing -> NLP: Applications
List of keywords 
Data Mining -> DM: Knowledge graphs and knowledge base completion Data Mining -> DM: Information retrieval
Natural Language Processing -> NLP: Applications
292
Recognizable Information Bottleneck
[+] More 
[-] Less 
Information Bottlenecks (IBs) learn representations that generalize to unseen data by information compression. However, existing IBs are practically unable to guarantee generalization in real-world scenarios due to the vacuous generalization bound. The recent PAC-Bayes IB uses information complexity instead of information compression to establish a connection with the mutual information generalization bound. However, it requires the computation of expensive second-order curvature, which hinders its practical application. In this paper, we establish the connection between the recognizability of representations and the recent functional conditional mutual information (f-CMI) generalization bound, which is significantly easier to estimate. On this basis we propose a Recognizable Information Bottleneck (RIB) which regularizes the recognizability of representations through a recognizability critic optimized by density ratio matching under the Bregman divergence. Extensive experiments on several commonly used datasets demonstrate the effectiveness of the proposed method in regularizing the model and estimating the generalization gap.
Machine Learning -> ML: Classification
List of keywords 
Machine Learning -> ML: Representation learning Machine Learning -> ML: Classification
298
3D Surface Super-resolution from Enhanced 2D Normal Images: A Multimodal-driven Variational AutoEncoder Approach
[+] More 
[-] Less 
3D surface super-resolution is an important technical tool in virtual reality, and it is also a research hotspot in computer vision. Due to the unstructured and irregular nature of 3D object data, it is usually difficult to obtain high-quality surface details and geometry textures via a low-cost hardware setup. In this paper, we establish a multimodal-driven variational autoencoder (mmVAE) framework to perform 3D surface enhancement based on 2D normal images. To fully leverage the multimodal learning, we investigate a multimodal Gaussian mixture model (mmGMM) to align and fuse the latent feature representations from different modalities, and further propose a cross-scale encoder-decoder structure to reconstruct high-resolution normal images. Experimental results on several benchmark datasets demonstrate that our method delivers promising surface geometry structures and details in comparison with competitive advances.
Computer Vision -> CV: 3D computer vision
List of keywords 
Computer Vision -> CV: Applications Computer Vision -> CV: 3D computer vision
299
Model Conversion via Differentially Private Data-Free Distillation
[+] More 
[-] Less 
While massive valuable deep models trained on large-scale data have been released to facilitate the artificial intelligence community, they may encounter attacks in deployment which leads to privacy leakage of training data. In this work, we propose a learning approach termed differentially private data-free distillation (DPDFD) for model conversion that can convert a pretrained model (teacher) into its privacy-preserving counterpart (student) via an intermediate generator without access to training data. The learning collaborates three parties in a unified way. First, massive synthetic data are generated with the generator. Then, they are fed into the teacher and student to compute differentially private gradients by normalizing the gradients and adding noise before performing descent. Finally, the student is updated with these differentially private gradients and the generator is updated by taking the student as a fixed discriminator in an alternate manner. In addition to a privacy-preserving student, the generator can generate synthetic data in a differentially private way for other down-stream tasks. We theoretically prove that our approach can guarantee differential privacy and well convergence. Extensive experiments that significantly outperform other differentially private generative approaches demonstrate the effectiveness of our approach.
Computer Vision -> CV: Bias, fairness and privacy
List of keywords 
Data Mining -> DM: Privacy-preserving data mining Computer Vision -> CV: Bias, fairness and privacy
307
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning
[+] More 
[-] Less 
The performance of existing supervised neuron segmentation methods is highly dependent on the number of accurate annotations, especially when applied to large scale electron microscopy (EM) data. By extracting semantic information from unlabeled data, self-supervised methods can improve the performance of downstream tasks, among which the mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images. However, due to the high degree of structural locality in EM images, as well as the existence of considerable noise, many voxels contain little discriminative information, making MIM pretraining inefficient on the neuron segmentation task. To overcome this challenge, we propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy. Due to the vast exploration space, using single-agent RL for voxel prediction is impractical. Therefore, we treat each input patch as an agent with a shared behavior policy, allowing for multi-agent collaboration. Furthermore, this multi-agent model can capture dependencies between voxels, which is beneficial for the downstream segmentation task. Experiments conducted on representative EM datasets demonstrate that our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation. Code is available at https://github.com/ydchen0806/dbMiM.
Computer Vision -> CV: Segmentation
Machine Learning -> ML: Self-supervised Learning
List of keywords 
Computer Vision -> CV: Biomedical image analysis Computer Vision -> CV: Segmentation
Machine Learning -> ML: Self-supervised Learning
326
TDG4Crowd:Test Data Generation for Evaluation of Aggregation Algorithms in Crowdsourcing
[+] More 
[-] Less 
In crowdsourcing, existing efforts mainly use real datasets collected from crowdsourcing as test datasets to evaluate the effectiveness of aggregation algorithms. However, these work ignore the fact that the datasets obtained by crowdsourcing are usually sparse and imbalanced due to limited budget. As a result, applying the same aggregation algorithm on different datasets often show contradicting conclusions. For example, on the RTE dataset, Dawid and Skene model performs significantly better than Majority Voting, while on the LableMe dataset, the experiments give the opposite conclusion. It is challenging to obtain comprehensive and balanced datasets at a low cost. To our best knowledge, little effort have been made to the fair evaluation of aggregation algorithms. To fill in this gap, we propose a novel method named TDG4Crowd  that can automatically generate comprehensive and balanced datasets. Using Kullback Leibler divergence and Kolmogorov–Smirnov test, the experiment results show the superior of our method compared with others. Aggregation algorithms also perform more consistently on the synthetic datasets generated using our method.
Machine Learning -> ML: Autoencoders
Machine Learning -> ML: Cost-sensitive learning
List of keywords 
Humans and AI -> HAI: Human computation and crowdsourcing Machine Learning -> ML: Autoencoders
Machine Learning -> ML: Cost-sensitive learning
328
Finding Mixed-Strategy Equilibria of Continuous-Action Games without Gradients Using Randomized Policy Networks
[+] More 
[-] Less 
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. Such game access is common in reinforcement learning settings, where the environment is typically treated as a black box. To tackle this problem, we apply zeroth-order optimization techniques that combine smoothed gradient estimators with equilibrium-finding dynamics.
We model players’ strategies using artificial neural networks. In particular, we use randomized policy networks to model mixed strategies. These take noise in addition to an observation as input and can flexibly represent arbitrary observation-dependent, continuous-action distributions. Being able to model such mixed strategies is crucial for tackling continuous-action games that lack pure-strategy equilibria.
We evaluate the performance of our method using an approximation of the Nash convergence metric from game theory, which measures how much players can benefit from unilaterally changing their strategy.
We apply our method to continuous Colonel Blotto games, single-item and multi-item auctions, and a visibility game.
The experiments show that our method can quickly find a high-quality approximate equilibrium.
Furthermore, they show that the dimensionality of the input noise is crucial for performance.
To our knowledge, this paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Noncooperative games 339
SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution Situations
[+] More 
[-] Less 
Robotic agents trained using reinforcement learning have the problem of taking unreliable actions in an out-of-distribution (OOD) state. Agents can easily become OOD in real-world environments because it is almost impossible for them to visit and learn the entire state space during training. Unfortunately, unreliable actions do not ensure that agents perform their original tasks successfully. Therefore, agents should be able to recognize whether they are in OOD states and learn how to return to the learned state distribution rather than continue to take unreliable actions. In this study, we propose a novel method for retraining agents to recover from OOD situations in a self-supervised manner when they fall into OOD states. Our in-depth experimental results demonstrate that our method substantially improves the agent’s ability to recover from OOD situations in terms of sample efficiency and restoration of the performance for the original tasks. Moreover, we show that our method can retrain the agent to recover from OOD situations even when in-distribution states are difficult to visit through exploration. Code and supplementary materials are available at https://github.com/SNUChanKim/SeRO.
Machine Learning -> ML: Self-supervised Learning
Robotics -> ROB: Learning in robotics
List of keywords 
Machine Learning -> ML: Deep reinforcement learning Machine Learning -> ML: Self-supervised Learning
Robotics -> ROB: Learning in robotics
345
OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking
[+] More 
[-] Less 
Two-stage point-to-box network acts as a critical role in the recent popular 3D Siamese tracking paradigm, which first generates proposals and then predicts corresponding proposal-wise scores. However, such a network suffers from tedious hyper-parameter tuning and task misalignment, limiting the tracking performance. Towards these concerns, we propose a simple yet effective one-stage point-to-box network for point cloud-based 3D single object tracking. It synchronizes 3D proposal generation and center-ness score prediction by a parallel predictor without tedious hyper-parameters. To guide a task-aligned score ranking of proposals, a center-aware focal loss is proposed to supervise the training of the center-ness branch, which enhances the network’s discriminative ability to distinguish proposals of different quality. Besides, we design a binary target classifier to identify target-relevant points. By integrating the derived classification scores with the center-ness scores, the resulting network can effectively suppress interference proposals and further mitigate task misalignment. Finally, we present a novel one-stage Siamese tracker OSP2B equipped with the designed network. Extensive experiments on challenging benchmarks including KITTI and Waymo SOT Dataset show that our OSP2B achieves leading performance with a considerable real-time speed.
Computer Vision -> CV: Motion and tracking
Robotics -> ROB: Robotics and vision
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Motion and tracking
Robotics -> ROB: Robotics and vision
350
Calibrating a Deep Neural Network with Its Predecessors
[+] More 
[-] Less 
Confidence calibration – the process to calibrate the output probability distribution of neural networks – is essential for safety-critical applications of such networks. Recent works verify the link between mis-calibration and overfitting. However, early stopping, as a well-known technique to mitigate overfitting, fails to calibrate networks. In this work, we study the limitions of early stopping and comprehensively analyze the overfitting problem of a network considering each individual block. We then propose a novel regularization method, predecessor combination search (PCS), to improve calibration by searching a combination of best-fitting block predecessors, where block predecessors are the corresponding network blocks with weight parameters from earlier training stages. PCS achieves the state-of-the-art calibration performance on multiple datasets and architectures. In addition, PCS improves model robustness under dataset distribution shift. Supplementary material and code are available at https://github.com/Linwei94/PCS
List of keywords 
Machine Learning -> ML: Classification 375
Spike Count Maximization for Neuromorphic Vision Recognition
[+] More 
[-] Less 
Spiking Neural Networks (SNNs) are the promising models of neuromorphic vision recognition. The mean square error (MSE) and cross-entropy (CE) losses are widely applied to supervise the training of SNNs on neuromorphic datasets. However, the relevance between the output spike counts and predictions is not well modeled by the existing loss functions. This paper proposes a Spike Count Maximization (SCM) training approach for the SNN-based neuromorphic vision recognition model based on optimizing the output spike counts. The SCM is achieved by structural risk minimization (SRM) and a specially designed spike counting loss. The spike counting loss counts the output spikes of the SNN by using the L0-norm, and the SRM maximizes the distance between the margin boundaries of the classifier to ensure the generalization of the model. The SCM is non-smooth and non-differentiable, and we design a two-stage algorithm with fast convergence to solve the problem. Experiment results demonstrate that the SCM performs satisfactorily in most cases. Using the output spikes for prediction, the accuracies of SCM are 2.12%~16.50% higher than the popular training losses on the CIFAR10-DVS dataset. The code is available at https://github.com/TJXTT/SCM-SNN.
Computer Vision -> CV: Machine learning for vision
List of keywords 
Machine Learning -> ML: Classification Computer Vision -> CV: Machine learning for vision
379
Eliminating the Computation of Strongly Connected Components in Generalized Arc Consistency Algorithm for AllDifferent Constraint
[+] More 
[-] Less 
AllDifferent constraint is widely used in Constraint Programming to model real world problems. Existing Generalized Arc Consistency (GAC) algorithms map an AllDifferent constraint onto a bipartite graph and utilize the structure of Strongly Connected Components (SCCs) in the graph to filter values. Calculating SCCs is time-consuming in the existing algorithms, so we propose a novel GAC algorithm for AllDifferent constraint in this paper, which eliminates the computation of SCCs. We prove that all redundant edges in the bipartite graph point to some alternating cycles. Our algorithm exploits this property and uses a more efficient method to filter values, which is based on breadth-first search. Experimental results on the XCSP3 benchmark suite show that our algorithm considerably outperforms the state-of-the-art GAC algorithms.
Constraint Satisfaction and Optimization -> CSO: Constraint programming
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction Constraint Satisfaction and Optimization -> CSO: Constraint programming
382
Learning to Learn from Corrupted Data for Few-Shot Learning
[+] More 
[-] Less 
Few-shot learning which aims to generalize knowledge learned from annotated base training data to recognize unseen novel classes has attracted considerable attention. Existing few-shot methods rely on completely clean training data. However, in the real world, the training data are always corrupted and accompanied by noise due to the disturbance in data transmission and low-quality annotation, which severely degrades the performance and generalization capability of few-shot models. To address the problem, we propose a unified peer-collaboration learning (PCL) framework to extract valid knowledge from corrupted data for few-shot learning. PCL leverages two modules to mimic the peer collaboration process which cooperatively evaluates the importance of each sample. Specifically, each module first estimates the importance weights of different samples by encoding the information provided by the other module from both global and local perspectives. Then, both modules leverage the obtained importance weights to guide the reevaluation of the loss value of each sample. In this way, the peers can mutually absorb knowledge to improve the robustness of few-shot models. Experiments verify that our framework combined with different few-shot methods can significantly improve the performance and robustness of original models.
List of keywords 
Machine Learning -> ML: Few-shot learning 389
Why Rumors Spread Fast in Social Networks, and How to Stop It
[+] More 
[-] Less 
We study a rumor spreading model where individuals are connected via a network structure. Initially, only a small subset of the individuals are spreading a rumor. Each individual who is connected to a spreader, starts spreading the rumor with some probability as a function of their trust in the spreader, quantified by the Jaccard similarity index. Furthermore, the probability that a spreader diffuses the rumor decreases over time until they fully lose their interest and stop spreading.
We focus on determining the graph parameters which govern the magnitude and pace that the rumor spreads in this model. We prove that for the rumor to spread to a sizable fraction of the individuals, the network needs to enjoy “strong” expansion properties and most nodes should be in “well-connected” communities. Both of these characteristics are, arguably, present in real-world social networks up to a certain degree, shedding light on the driving force behind the extremely fast spread of rumors in social networks.
Furthermore, we formulate a large range of countermeasures to cease the spread of a rumor. We introduce four fundamental criteria which a countermeasure ideally should possess. We evaluate all the proposed countermeasures by conducting experiments on real-world social networks such as Facebook and Twitter. We conclude that our novel decentralized countermeasures (which are executed by the individuals) generally outperform the previously studied centralized ones (which need to be imposed by a third entity such as the government).
Game Theory and Economic Paradigms -> GTEP: Computational social choice
Multidisciplinary Topics and Applications -> MDA: Web and social networks
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Agent theories and models Game Theory and Economic Paradigms -> GTEP: Computational social choice
Multidisciplinary Topics and Applications -> MDA: Web and social networks
396
A Large-Scale Film Style Dataset for Learning Multi-frequency Driven Film Enhancement
[+] More 
[-] Less 
Film, a classic image style, is culturally significant to the whole photographic industry since it marks the birth of photography. However, film photography is time-consuming and expensive, necessitating a more efficient method for collecting film-style photographs. Numerous datasets that have emerged in the field of image enhancement so far are not film-specific. In order to facilitate film-based image stylization research, we construct FilmSet, a large-scale and high-quality film style dataset. Our dataset includes three different film types and more than 5000 in-the-wild high resolution images. Inspired by the features of FilmSet images, we propose a novel framework called FilmNet based on Laplacian Pyramid for stylizing images across frequency bands and achieving film style outcomes. Experiments reveal that the performance of our model is superior than state-of-the-art techniques. The link of our dataset and code is https://github.com/CXH-Research/FilmNet.
Computer Vision -> CV: Machine learning for vision
List of keywords 
Computer Vision -> CV: Computational photography Computer Vision -> CV: Machine learning for vision
398
SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation
[+] More 
[-] Less 
As an important and challenging problem in computer vision, PAnoramic Semantic Segmentation (PASS) gives complete scene perception based on an ultra-wide angle of view. Usually, prevalent PASS methods with 2D panoramic image input focus on solving image distortions but lack consideration of the 3D properties of original 360 degree data. Therefore, their performance will drop a lot when inputting panoramic images with the 3D disturbance. To be more robust to 3D disturbance, we propose our Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation (SGAT4PASS), considering 3D spherical geometry knowledge. Specifically, a spherical geometry-aware framework is proposed for PASS. It includes three modules, i.e., spherical geometry-aware image projection, spherical deformable patch embedding, and a panorama-aware loss, which takes input images with 3D disturbance into account, adds a spherical geometry-aware constraint on the existing deformable patch embedding, and indicates the pixel density of original 360 degree data, respectively. Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of our performance is improved by an order of magnitude. Our code and supplementary material are available at https://github.com/TencentARC/SGAT4PASS.
Computer Vision -> CV: Recognition (object detection, categorization)
List of keywords 
Computer Vision -> CV: Scene analysis and understanding    Computer Vision -> CV: Recognition (object detection, categorization)
399
Asynchronous Communication Aware Multi-Agent Task Allocation
[+] More 
[-] Less 
Multi-agent task allocation in physical environments with spatial and temporal constraints, are hard problems that are relevant in many realistic applications. A task allocation algorithm based on Fisher market clearing (FMC_TA), that can be performed either centrally or distributively, has been shown to produce high quality allocations in comparison to both centralized and distributed state of the art incomplete optimization algorithms. However, the algorithm is synchronous and therefore depends on perfect communication between agents.
We propose FMC_ATA, an asynchronous version of FMC_TA, which is robust to message latency and message loss. In contrast to the former version of the algorithm, FMC_ATA allows agents to identify dynamic events and initiate the generation of an updated allocation. Thus, it is more compatible for dynamic environments. We further investigate the conditions in which the distributed version of the algorithm is preferred over the centralized version. Our results indicate that the proposed asynchronous distributed algorithm produces consistent results even when the communication level is extremely poor.
Agent-based and Multi-agent Systems -> MAS: Agent communication
Constraint Satisfaction and Optimization -> CSO: Distributed constraints
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation Agent-based and Multi-agent Systems -> MAS: Agent communication
Constraint Satisfaction and Optimization -> CSO: Distributed constraints
400
Measuring Acoustics with Collaborative Multiple Agents
[+] More 
[-] Less 
As humans, we hear sound every second of our life. The sound we hear is often affected by the acoustics of the environment surrounding us. For example, a spacious hall leads to more reverberation. Room Impulse Responses (RIR) are commonly used to characterize environment acoustics as a function of the scene geometry, materials, and source/receiver locations. Traditionally, RIRs are measured by setting up a loudspeaker and microphone in the environment for all source/receiver locations, which is time-consuming and inefficient. We propose to let two robots measure the environment’s acoustics by actively moving and emitting/receiving sweep signals. We also devise a collaborative multi-agent policy where these two robots are trained to explore the environment’s acoustics while being rewarded for wide exploration and accurate prediction. We show that the robots learn to collaborate and move to explore environment acoustics while minimizing the prediction error. To the best of our knowledge, we present the very first problem formulation and solution to the task of collaborative environment acoustics measurements with multiple agents.
Game Theory and Economic Paradigms -> GTEP: Cooperative games
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning Game Theory and Economic Paradigms -> GTEP: Cooperative games
408
Learning Object Consistency and Interaction in Image Generation from Scene Graphs
[+] More 
[-] Less 
This paper is concerned with synthesizing images conditioned on a scene graph (SG), a set of object nodes and their edges of interactive relations. We divide existing works into image-oriented and code-oriented methods. In our analysis, the image-oriented methods do not consider object interaction in spatial hidden feature. On the other hand, in empirical study, the code-oriented methods lose object consistency as their generated images miss certain objects in the input scene graph. To alleviate these two issues, we propose Learning Object Consistency and Interaction (LOCI). To preserve object consistency, we design a consistency module with a weighted augmentation strategy for objects easy to be ignored and a matching loss between scene graphs and image codes. To learn object interaction, we design an interaction module consisting of three kinds of message propagation between the input scene graph and the learned image code. Experiments on COCO-stuff and Visual Genome datasets show our proposed method alleviates the ignorance of objects and outperforms the state-of-the-art on visual fidelity of generated images and objects.
Computer Vision -> CV: Scene analysis and understanding
List of keywords 
Computer Vision -> CV: Neural generative models, auto encoders, GANs   Computer Vision -> CV: Scene analysis and understanding
429
VGOS: Voxel Grid Optimization for View Synthesis from Sparse Inputs
[+] More 
[-] Less 
Neural Radiance Fields (NeRF) has shown great success in novel view synthesis due to its state-of-the-art quality and flexibility. However, NeRF requires dense input views (tens to hundreds) and a long training time (hours to days) for a single scene to generate high-fidelity images. Although using the voxel grids to represent the radiance field can significantly accelerate the optimization process, we observe that for sparse inputs, the voxel grids are more prone to overfitting to the training views and will have holes and floaters, which leads to artifacts. In this paper, we propose VGOS, an approach for fast (3-5 minutes) radiance field reconstruction from sparse inputs (3-10 views) to address these issues. To improve the performance of voxel-based radiance field in sparse input scenarios, we propose two methods: (a) We introduce an incremental voxel training strategy, which prevents overfitting by suppressing the optimization of peripheral voxels in the early stage of reconstruction. (b) We use several regularization techniques to smooth the voxels, which avoids degenerate solutions. Experiments demonstrate that VGOS achieves state-of-the-art performance for sparse inputs with super-fast convergence. Code will be available at https://github.com/SJoJoK/VGOS.
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Applications
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Computer Vision -> CV: Applications
434
Unreliable Partial Label Learning with Recursive Separation
[+] More 
[-] Less 
Partial label learning (PLL) is a typical weakly supervised learning problem in which each instance is associated with a candidate label set, and among which only one is true. However, the assumption that the ground-truth label is always among the candidate label set would be unrealistic, as the reliability of the candidate label sets in real-world applications cannot be guaranteed by annotators. Therefore, a generalized PLL named Unreliable Partial Label Learning (UPLL) is proposed, in which the true label may not be in the candidate label set. Due to the challenges posed by unreliable labeling, previous PLL methods will experience a marked decline in performance when applied to UPLL. To address the issue, we propose a two-stage framework named Unreliable Partial Label Learning with Recursive Separation (UPLLRS). In the first stage, the self-adaptive recursive separation strategy is proposed to separate the training set into a reliable subset and an unreliable subset. In the second stage, a disambiguation strategy is employed to progressively identify the ground-truth labels in the reliable subset. Simultaneously, semi-supervised learning methods are adopted to extract valuable information from the unreliable subset. Our method demonstrates state-of-the-art performance as evidenced by experimental results, particularly in situations of high unreliability. Code and supplementary materials are available at https://github.com/dhiyu/UPLLRS.
List of keywords 
Machine Learning -> ML: Weakly supervised learning 435
ViT-CX: Causal Explanation of Vision Transformers
[+] More 
[-] Less 
Despite the popularity of Vision Transformers (ViTs) and eXplainable AI (XAI), only a few explanation methods have been designed specially for ViTs thus far. They mostly use attention weights of the [CLS] token on patch embeddings and often produce unsatisfactory saliency maps. This paper proposes a novel method for explaining ViTs called ViT-CX. It is based on patch embeddings, rather than attentions paid to them, and their causal impacts on the model output. Other characteristics of ViTs such as causal overdetermination are considered in the design of ViT-CX. The empirical results show that ViT-CX produces more meaningful saliency maps and does a better job revealing all important evidence for the predictions than previous methods. The explanation generated by ViT-CX also shows significantly better faithfulness to the model. The codes and appendix are available at https://github.com/vaynexie/CausalX-ViT.
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Explainable/Interpretable machine learning
List of keywords 
Computer Vision -> CV: Interpretability and transparency AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Explainable/Interpretable machine learning
449
CTW: Confident Time-Warping for Time-Series Label-Noise Learning
[+] More 
[-] Less 
Noisy labels seriously degrade the generalization ability of Deep Neural Networks (DNNs) in various classification tasks. Existing studies on label-noise learning mainly focus on computer vision, while time series also suffer from the same issue. Directly applying the methods from computer vision to time series may reduce the temporal dependency due to different data characteristics. How to make use of the properties of time series to enable DNNs to learn robust representations in the presence of noisy labels has not been fully explored. To this end, this paper proposes a method that expands the distribution of Confident instances by Time-Warping (CTW) to learn robust representations of time series. Specifically, since applying the augmentation method to all data may introduce extra mislabeled data, we select confident instances to implement Time-Warping. In addition, we normalize the distribution of the training loss of each class to eliminate the model’s selection preference for instances of different classes, alleviating the class imbalance caused by sample selection. Extensive experimental results show that CTW achieves state-of-the-art performance on the UCR datasets when dealing with different types of noise. Besides, the t-SNE visualization of our method verifies that augmenting confident data improves the generalization ability. Our code is available at https://github.com/qianlima-lab/CTW.
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Time series and data streams
List of keywords 
Machine Learning -> ML: Classification Machine Learning -> ML: Representation learning
Machine Learning -> ML: Time series and data streams
451
Invertible Residual Neural Networks with Conditional Injector and Interpolator for Point Cloud Upsampling
[+] More 
[-] Less 
Point clouds obtained by LiDAR and other sensors are usually sparse and irregular. Low-quality point clouds have serious influence on the final performance of downstream tasks. Recently, a point cloud upsampling network with normalizing flows has been proposed to address this problem. However, the network heavily relies on designing specialized architectures to achieve invertibility. In this paper, we propose a novel invertible residual neural network for point cloud upsampling, called PU-INN, which allows unconstrained architectures to learn more expressive feature transformations. Then, we propose a conditional injector to improve nonlinear transformation ability of the neural network while guaranteeing invertibility. Furthermore, a lightweight interpolator is proposed based on semantic similarity distance in the latent space, which can intuitively reflect the interpolation changes in Euclidean space. Qualitative and quantitative results show that our method outperforms the state-of-the-art works in terms of distribution uniformity, proximity-to-surface accuracy, 3D reconstruction quality, and computation efficiency.
Computer Vision -> CV: Neural generative models, auto encoders, GANs
Machine Learning -> ML: Probabilistic machine learning
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Neural generative models, auto encoders, GANs
Machine Learning -> ML: Probabilistic machine learning
460
ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds
[+] More 
[-] Less 
In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation. Different from usually adopted self-supervised strategies for data-level structure consistency, we predict scene motion via feature-level consistency between pillars in consecutive frames, which can eliminate the effect caused by noise points and view-changing point clouds in dynamic scenes. Specifically, we propose Soft Discriminative Loss that provides the network with more pseudo-supervised signals to learn discriminative and robust features in a contrastive learning manner. We also propose Gated Multi-Frame Fusion block that learns valid compensation between point cloud frames automatically to enhance feature extraction. Finally, pillar association is proposed to predict pillar correspondence probabilities based on feature distance, and whereby further predicts scene motion. Extensive experiments show the effectiveness and superiority of our ContrastMotion on both scene flow and motion prediction tasks.
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Scene analysis and understanding
List of keywords 
Computer Vision -> CV: Motion and tracking Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Scene analysis and understanding
479
Causal Deep Reinforcement Learning Using Observational Data
[+] More 
[-] Less 
Deep reinforcement learning (DRL) requires the collection of interventional data, which is sometimes expensive and even unethical in the real world, such as in the autonomous driving and the medical field. Offline reinforcement learning promises to alleviate this issue by exploiting the vast amount of observational data available in the real world. However, observational data may mislead the learning agent to undesirable outcomes if the behavior policy that generates the data depends on unobserved random variables (i.e., confounders). In this paper, we propose two deconfounding methods in DRL to address this problem. The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function by reweighting or resampling the offline dataset to ensure its unbiasedness. These deconfounding methods can be flexibly combined with existing model-free DRL algorithms such as soft actor-critic and deep Q-learning, provided that a weak condition can be satisfied by the loss functions of these algorithms. We prove the effectiveness of our deconfounding methods and validate them experimentally.
Machine Learning -> ML: Causality
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
List of keywords 
Machine Learning -> ML: Deep reinforcement learning Machine Learning -> ML: Causality
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
485
KDLGT: A Linear Graph Transformer Framework via Kernel Decomposition Approach
[+] More 
[-] Less 
In recent years, graph Transformers (GTs) have been demonstrated as a robust architecture for a wide range of graph learning tasks. However, the quadratic complexity of GTs limits their scalability on large-scale data, in comparison to Graph Neural Networks (GNNs). In this work, we propose the Kernel Decomposition Linear Graph Transformer (KDLGT), an accelerating framework for building scalable and powerful GTs. KDLGT employs the kernel decomposition approach to rearrange the order of matrix multiplication, thereby reducing complexity to linear. Additionally, it categorizes GTs into three distinct types and provides tailored accelerating methods for each category to encompass all types of GTs. Furthermore, we provide a theoretical analysis of the performance gap between KDLGT and self-attention to ensure its effectiveness. Under this framework, we select two representative GTs to design our models. Experiments on both real-world and synthetic datasets indicate that KDLGT not only achieves state-of-the-art performance on various datasets but also reaches an acceleration ratio of approximately 10 on graphs of certain sizes.
Data Mining -> DM: Mining graphs
List of keywords 
Data Mining -> DM: Big data and scalability Data Mining -> DM: Mining graphs
490
Physics-Guided Human Motion Capture with Pose Probability Modeling
[+] More 
[-] Less 
Incorporating physics in human motion capture to avoid artifacts like floating, foot sliding, and ground penetration is a promising direction. Existing solutions always adopt kinematic results as reference motions, and the physics is treated as a post-processing module. However, due to the depth ambiguity, monocular motion capture inevitably suffers from noises, and the noisy reference often leads to failure for physics-based tracking. To address the obstacles, our key-idea is to employ physics as denoising guidance in the reverse diffusion process to reconstruct physically plausible human motion from a modeled pose probability distribution. Specifically, we first train a latent gaussian model that encodes the uncertainty of 2D-to-3D lifting to facilitate reverse diffusion. Then, a physics module is constructed to track the motion sampled from the distribution. The discrepancies between the tracked motion and image observation are used to provide explicit guidance for the reverse diffusion model to refine the motion. With several iterations, the physics-based tracking and kinematic denoising promote each other to generate a physically plausible human motion. Experimental results show that our method outperforms previous physics-based methods in both joint accuracy and success rate. More information can be found at https://github.com/Me-Ditto/Physics-Guided-Mocap.
Computer Vision -> CV: 3D computer vision
List of keywords 
Computer Vision -> CV: Biometrics, face, gesture and pose recognition Computer Vision -> CV: 3D computer vision
526
Constraints First: A New MDD-based Model to Generate Sentences Under Constraints
[+] More 
[-] Less 
This paper introduces a new approach to generating strongly constrained texts. We consider standardized sentence generation for the typical application of vision screening. To solve this problem, we formalize it as a discrete combinatorial optimization problem and utilize multivalued decision diagrams (MDD), a well-known data structure to deal with constraints. In our context, one key strength of MDD is to compute an exhaustive set of solutions without performing any search. Once the sentences are obtained, we apply a language model (GPT-2) to keep the best ones. We detail this for English and also for French where the agreement and conjugation rules are known to be more complex. Finally, with the help of GPT-2, we get hundreds of bona-fide candidate sentences. When compared with the few dozen sentences usually available in the well-known vision screening test (MNREAD), this brings a major breakthrough in the field of standardized sentence generation. Also, as it can be easily adapted for other languages, it has the potential to make the MNREAD test even more valuable and usable. More generally, this paper highlights MDD as a convincing alternative for constrained text generation, especially when the constraints are hard to satisfy, but also for many other prospects.
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Modeling
Natural Language Processing -> NLP: Language generation
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint programming Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Modeling
Natural Language Processing -> NLP: Language generation
534
On the Fairness Impacts of Private Ensembles Models
[+] More 
[-] Less 
The Private Aggregation of Teacher Ensembles (PATE) is a machine learning framework that enables the creation of private models through the combination of multiple "teacher" models and a "student" model. The student model learns to predict an output based on the voting of the teachers, and the resulting model satisfies differential privacy. PATE has been shown to be effective in creating private models in semi-supervised settings or when protecting data labels is a priority. 
This paper explores whether the use of PATE can result in unfairness, and demonstrates that it can lead to accuracy disparities among groups of individuals. The paper also analyzes the algorithmic and data properties that contribute to these disproportionate impacts, why these aspects are affecting different groups disproportionately, and offers recommendations for mitigating these effects.
Computer Vision -> CV: Bias, fairness and privacy
Multidisciplinary Topics and Applications -> MDA: Security and privacy
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI Computer Vision -> CV: Bias, fairness and privacy
Multidisciplinary Topics and Applications -> MDA: Security and privacy
536
BPNet: Bézier Primitive Segmentation on 3D Point Clouds
[+] More 
[-] Less 
This paper proposes BPNet, a novel end-to-end deep learning framework to learn Bézier primitive segmentation on 3D point clouds. The existing works treat different primitive types separately, thus limiting them to finite shape categories. To address this issue, we seek a generalized primitive segmentation on point clouds. Taking inspiration from Bézier decomposition on NURBS models, we transfer it to guide point cloud segmentation casting off primitive types. A joint optimization framework is proposed to learn Bézier primitive segmentation and geometric fitting simultaneously on a cascaded architecture. Specifically, we introduce a soft voting regularizer to improve primitive segmentation and propose an auto-weight embedding module to cluster point features, making the network more robust and generic. We also introduce a reconstruction module where we successfully process multiple CAD models with different primitives simultaneously. We conducted extensive experiments on the synthetic ABC dataset and real-scan datasets to validate and compare our approach with different baseline methods. Experiments show superior performance over previous work in terms of segmentation, with a substantially faster inference speed.
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Segmentation
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Segmentation
539
Rubik’s Optical Neural Networks: Multi-task Learning with Physics-aware Rotation Architecture
[+] More 
[-] Less 
Recently, there are increasing efforts on advancing optical neural networks (ONNs), which bring significant advantages for machine learning (ML) in terms of power efficiency, parallelism, and computational speed. With the considerable benefits in computation speed and energy efficiency, there are significant interests in leveraging ONNs into medical sensing, security screening, drug detection, and autonomous driving. However, due to the challenge of implementing reconfigurability, deploying multi-task learning (MTL) algorithms on ONNs requires re-building and duplicating the physical diffractive systems, which significantly degrades the energy and cost efficiency in practical application scenarios. This work presents a novel ONNs architecture, namely, RubikONNs, which utilizes the physical properties of optical systems to encode multiple feed-forward functions by physically rotating the hardware similarly to rotating a Rubik’s Cube. To optimize MTL performance on RubikONNs, two domain-specific physics-aware training algorithms RotAgg and RotSeq are proposed. Our experimental results demonstrate more than 4x improvements in energy and cost efficiency with marginal accuracy degradation compared to the state-of-the-art approaches.
Machine Learning -> ML: Classification
Multidisciplinary Topics and Applications -> MDA: Physical sciences
List of keywords 
Multidisciplinary Topics and Applications -> MDA: AI hardware Machine Learning -> ML: Classification
Multidisciplinary Topics and Applications -> MDA: Physical sciences
541
On Approximating Total Variation Distance
[+] More 
[-] Less 
Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the problem of computing the TV distance of two product distributions over the domain {0,1}^n. In particular, we establish the following results.
1. The problem of exactly computing the TV distance of two product distributions is #P-complete. This is in stark contrast with other distance measures such as KL, Chi-square, and Hellinger which tensorize over the marginals leading to efficient algorithms.
2. There is a fully polynomial-time deterministic approximation scheme (FPTAS)  for computing the TV distance of two product distributions P and Q where Q is the uniform distribution. This result is extended to the case where Q has a constant number of distinct marginals. In contrast, we show that when P and Q are Bayes net distributions the relative approximation of their TV distance is NP-hard.
List of keywords 
Machine Learning -> ML: Other 547
Ensemble Reinforcement Learning in Continuous Spaces — A Hierarchical Multi-Step Approach for Policy Training
[+] More 
[-] Less 
Actor-critic deep reinforcement learning (DRL) algorithms have recently achieved prominent success in tackling various challenging reinforcement learning (RL) problems, particularly complex control tasks with high-dimensional continuous state and action spaces. Nevertheless, existing research showed that actor-critic DRL algorithms often failed to explore their learning environments effectively, resulting in limited learning stability and performance. To address this limitation, several ensemble DRL algorithms have been proposed lately to boost exploration and stabilize the learning process. However, most of existing ensemble algorithms do not explicitly train all base learners towards jointly optimizing the performance of the ensemble. In this paper, we propose a new technique to train an ensemble of base learners based on an innovative multi-step integration method. This training technique enables us to develop a new hierarchical learning algorithm for ensemble DRL that effectively promotes inter-learner collaboration through stable inter-learner parameter sharing. The design of our new algorithm is verified theoretically. The algorithm is also shown empirically to outperform several state-of-the-art DRL algorithms on multiple benchmark RL problems.
Machine Learning -> ML: Ensemble methods
List of keywords 
Machine Learning -> ML: Deep reinforcement learning Machine Learning -> ML: Ensemble methods
555
Strategic Adversarial Attacks in AI-assisted Decision Making to Reduce Human Trust and Reliance
[+] More 
[-] Less 
With the increased integration of AI technologies in human decision making processes, adversarial attacks on AI models become a greater concern than ever before as they may significantly hurt humans’ trust in AI models and decrease the effectiveness of human-AI collaboration. While many adversarial attack methods have been proposed to decrease the performance of an AI model, limited attention has been paid on understanding how these attacks will impact the human decision makers interacting with the model, and accordingly, how to strategically deploy adversarial attacks to maximize the reduction of human trust and reliance. In this paper, through a human-subject experiment, we first show that in AI-assisted decision making, the timing of the attacks largely influences how much humans decrease their trust in and reliance on AI—the decrease is particularly salient when attacks occur on decision making tasks that humans are highly confident themselves. Based on these insights, we next propose an algorithmic framework to infer the human decision maker’s hidden trust in the AI model and dynamically decide when the attacker should launch an attack to the model. Our evaluations show that following the proposed approach, attackers deploy more efficient attacks and achieve higher utility than adopting other baseline strategies.
Humans and AI -> HAI: Applications
Humans and AI -> HAI: Human-computer interaction
List of keywords 
Humans and AI -> HAI: Human-AI collaboration Humans and AI -> HAI: Applications
Humans and AI -> HAI: Human-computer interaction
580
Robust Steganography without Embedding Based on Secure Container Synthesis and Iterative Message Recovery
[+] More 
[-] Less 
Synthesis-based steganography without embedding (SWE) methods transform secret messages to container images synthesised by generative networks, which eliminates distortions of container images and thus can fundamentally resist typical steganalysis tools. However, existing methods suffer from weak message recovery robustness, synthesis fidelity, and the risk of message leakage. To address these problems, we propose a novel robust steganography without embedding method in this paper. In particular, we design a secure weight modulation-based generator by introducing secure factors to hide secret messages in synthesised container images. In this manner, the synthesised results are modulated by secure factors and thus the secret messages are inaccessible when using fake factors, thus reducing the risk of message leakage. Furthermore, we design a difference predictor via the reconstruction of tampered container images together with an adversarial training strategy to iteratively update the estimation of hidden messages. This ensures robustness of recovering hidden messages, while degradation of synthesis fidelity is reduced since the generator is not included in the adversarial training. Extensive experimental results convincingly demonstrate that our proposed method is effective in avoiding message leakage and superior to other existing methods in terms of recovery robustness and synthesis fidelity.
Computer Vision -> CV: Applications
Computer Vision -> CV: Neural generative models, auto encoders, GANs
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Security and privacy Computer Vision -> CV: Applications
Computer Vision -> CV: Neural generative models, auto encoders, GANs
586
PasCore: A Chinese Overlapping Relation Extraction Model Based on Global Pointer Annotation Strategy
[+] More 
[-] Less 
Recent work for extracting relations from texts has achieved excellent performance. However, existing studies mainly focus on simple relation extraction, these methods perform not well on overlapping triple problem because the tags of shared entities would conflict with each other.  Especially, overlapping entities are common and indispensable in Chinese. To address this issue, this paper proposes PasCore, which utilizes a global pointer annotation strategy for overlapping relation extraction in Chinese. PasCore first obtains the sentence vector via general pre-training model encoder, and uses classifier to predicate relations. Subsequently, it uses global pointer annotation strategy for head entity annotation, which uses global tags to label the start and end positions of the entities. Finally, PasCore integrates the relation, head entity and its type to mark the tail entity. Furthermore, PasCore performs conditional layer normalization to fuse features, which connects all stages and greatly enriches the association between relations and entities. Experimental results on both Chinese and English real-world datasets demonstrate that PasCore outperforms strong baselines on relation extraction and, especially, shows superior performance on overlapping relation extraction.
Data Mining -> DM: Knowledge graphs and knowledge base completion
List of keywords 
Natural Language Processing -> NLP: Information extraction Data Mining -> DM: Knowledge graphs and knowledge base completion
596
Deep Unfolding Convolutional Dictionary Model for Multi-Contrast MRI Super-resolution and Reconstruction
[+] More 
[-] Less 
Magnetic resonance imaging (MRI) tasks often involve multiple contrasts. Recently, numerous deep learning-based multi-contrast MRI super-resolution (SR) and reconstruction methods have been proposed to explore the complementary information from the multi-contrast images. However, these methods either construct parameter-sharing networks or manually design fusion rules, failing to accurately model the correlations between multi-contrast images and lacking certain interpretations. In this paper, we propose a multi-contrast convolutional dictionary (MC-CDic) model under the guidance of the optimization algorithm with a well-designed data fidelity term. Specifically, we bulid an observation model for the multi-contrast MR images to explicitly model the multi-contrast images as common features and unique features. In this way, only the useful information in the reference image can be transferred to the target image, while the inconsistent information will be ignored. We employ the proximal gradient algorithm to optimize the model and unroll the iterative steps into a deep CDic model. Especially, the proximal operators are replaced by learnable ResNet. In addition, multi-scale dictionaries are introduced to further improve the model performance. We test our MC-CDic model on multi-contrast MRI SR and reconstruction tasks. Experimental results demonstrate the superior performance of the proposed MC-CDic model against existing SOTA methods. Code is available at https://github.com/lpcccc-cv/MC-CDic.
List of keywords 
Computer Vision -> CV: Biomedical image analysis 604
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
[+] More 
[-] Less 
Text-video retrieval is a challenging cross-modal task, which aims to align visual entities with natural language descriptions. Current methods either fail to leverage the local details or are computationally expensive. What’s worse, they fail to leverage the heterogeneous concepts in data. In this paper, we propose the Disentangled Conceptualization and Set-to-set Alignment (DiCoSA) to simulate the conceptualizing and reasoning process of human beings. For disentangled conceptualization, we divide the coarse feature into multiple latent factors related to semantic concepts. For set-to-set alignment, where a set of visual concepts correspond to a set of textual concepts, we propose an adaptive pooling method to aggregate semantic concepts to address the partial matching. In particular, since we encode concepts independently in only a few dimensions, DiCoSA is superior at efficiency and granularity, ensuring fine-grained interactions using a similar computational complexity as coarse-grained alignment. Extensive experiments on five datasets, including MSR-VTT, LSMDC, MSVD, ActivityNet, and DiDeMo, demonstrate that our method outperforms the existing state-of-the-art methods.
Computer Vision -> CV: Video analysis and understanding
Computer Vision -> CV: Vision and language
List of keywords 
Computer Vision -> CV: Image and video retrieval  Computer Vision -> CV: Video analysis and understanding
Computer Vision -> CV: Vision and language
607
Contact2Grasp: 3D Grasp Synthesis via Hand-Object Contact Constraint
[+] More 
[-] Less 
3D grasp synthesis generates grasping poses given an input object. Existing works tackle the problem by learning a direct mapping from objects to the distributions of grasping poses. However, because the physical contact is sensitive to small changes in pose, the high-nonlinear mapping between 3D object representation to valid poses is considerably non-smooth, leading to poor generation efficiency and restricted generality. To tackle the challenge, we introduce an intermediate variable for grasp contact areas to constrain the grasp generation; in other words, we factorize the mapping into two sequential stages by assuming that grasping poses are fully constrained given contact maps: 1) we first learn contact map distributions to generate the potential contact maps for grasps; 2) then learn a mapping from the contact maps to the grasping poses. Further, we propose a penetration-aware optimization with the generated contacts as a consistency constraint for grasp refinement. Extensive validations on two public datasets show that our method outperforms state-of-the-art methods regarding grasp generation on various metrics.
Computer Vision -> CV: Applications
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Applications
619
Controlling Neural Style Transfer with Deep Reinforcement Learning
[+] More 
[-] Less 
Controlling the degree of stylization in the Neural Style Transfer (NST) is a little tricky since it usually needs hand-engineering on hyper-parameters. In this paper, we propose the first deep Reinforcement Learning (RL) based architecture that splits one-step style transfer into a step-wise process for the NST task. Our RL-based method tends to preserve more details and structures of the content image in early steps, and synthesize more style patterns in later steps. It is a user-easily-controlled style-transfer method. Additionally, as our RL-based model performs the stylization progressively, it is lightweight and has lower computational complexity than existing one-step Deep Learning (DL) based models. Experimental results demonstrate the effectiveness and robustness of our method.
Computer Vision -> CV: Applications
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Applications Computer Vision -> CV: Applications
621
Compositional Zero-Shot Artistic Font Synthesis
[+] More 
[-] Less 
Recently, many researchers have made remarkable achievements in the field of artistic font synthesis, with impressive glyph style and effect style in the results. However, due to less exploration in style disentanglement, it is difficult for existing methods to envision a kind of unseen style (glyph-effect) compositions of artistic font, and thus can only learn the seen style compositions. To solve this problem, we propose a novel compositional zero-shot artistic font synthesis gan (CAFS-GAN), which allows the synthesis of unseen style compositions by exploring the visual independence and joint compatibility of encoding semantics between glyph and effect. Specifically, we propose two contrast-based style encoders to achieve style disentanglement due to glyph and effect intertwining in the image. Meanwhile, to preserve more glyph and effect detail, we propose a generator based on hierarchical dual styles AdaIN to reorganize content-styles representations from structure to texture gradually. Extensive experiments demonstrate the superiority of our model in generating high-quality artistic font images with unseen style compositions against other state-of-the-art methods. The source code and data is available at moonlight03.github.io/CAFS-GAN/.
Computer Vision -> CV: Neural generative models, auto encoders, GANs
List of keywords 
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning    Computer Vision -> CV: Neural generative models, auto encoders, GANs
639
Discrepancy-Guided Reconstruction Learning for Image Forgery Detection
[+] More 
[-] Less 
In this paper, we propose a novel image forgery detection paradigm for boosting the model learning capacity on both forgery-sensitive and genuine compact visual patterns. Compared to the existing methods that only focus on the discrepant-specific patterns (\eg, noises, textures, and frequencies), our method has a greater generalization. Specifically, we first propose a Discrepancy-Guided Encoder (DisGE) to extract forgery-sensitive visual patterns. DisGE consists of two branches, where the mainstream backbone branch is used to extract general semantic features, and the accessorial discrepant external attention branch is used to extract explicit forgery cues. Besides, a Double-Head Reconstruction (DouHR) module is proposed to enhance genuine compact visual patterns in different granular spaces. Under DouHR, we further introduce a Discrepancy-Aggregation Detector (DisAD) to aggregate these genuine compact visual patterns, such that the forgery detection capability on unknown patterns can be improved. Extensive experimental results on four challenging datasets validate the effectiveness of our proposed method against state-of-the-art competitors.
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
List of keywords 
Computer Vision -> CV: Biometrics, face, gesture and pose recognition Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
644
Dichotomous Image Segmentation with Frequency Priors
[+] More 
[-] Less 
Dichotomous image segmentation (DIS) has a wide range of real-world applications and gained increasing research attention in recent years. In this paper, we propose to tackle DIS with informative frequency priors. Our model, called FP-DIS, stems from the fact that prior knowledge in the frequency domain can provide valuable cues to identify fine-grained object boundaries. Specifically, we propose a frequency prior generator to jointly utilize a fixed filter and learnable filters to extract informative frequency priors. Before embedding the frequency priors into the network, we first harmonize the multi-scale side-out features to reduce their heterogeneity. This is achieved by our feature harmonization module, which is based on a gating mechanism to harmonize the grouped features. Finally, we propose a frequency prior embedding module to embed the frequency priors into multi-scale features through an adaptive modulation strategy. Extensive experiments on the benchmark dataset, DIS5K, demonstrate that our FP-DIS outperforms state-of-the-art methods by a large margin in terms of key evaluation metrics.
Computer Vision -> CV: Scene analysis and understanding
List of keywords 
Computer Vision -> CV: Segmentation Computer Vision -> CV: Scene analysis and understanding
648
HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning
[+] More 
[-] Less 
Federated learning (FL) collaboratively models user data in a decentralized way. However, in the real world, non-identical and independent data distributions (non-IID) among clients hinder the performance of FL due to three issues, i.e., (1) the class statistics shifting, (2) the insufficient hierarchical information utilization, and (3) the inconsistency in aggregating clients. To address the above issues, we propose HyperFed which contains three main modules, i.e., hyperbolic prototype Tammes initialization (HPTI), hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly, HPTI in the server constructs uniformly distributed and fixed class prototypes, and shares them with clients to match class statistics, further guiding consistent feature representation for local clients. Secondly, HPL in each client captures the hierarchical information in local data with the supervision of shared class prototypes in the hyperbolic model space. Additionally, CA in the server mitigates the impact of the inconsistent deviations from clients to server. Extensive studies of four datasets prove that HyperFed is effective in enhancing the performance of FL under the non-IID setting.
Computer Vision -> CV: Representation learning
List of keywords 
Machine Learning -> ML: Federated learning Computer Vision -> CV: Representation learning
656
Learning 3D Photography Videos via Self-supervised Diffusion on Single Images
[+] More 
[-] Less 
3D photography renders a static image into a video with appealing 3D visual effects. Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints, and finally use an inpainting model to fill those missing/occluded regions. The inpainting model plays a crucial role in rendering quality, but it is normally trained on out-of-domain data. To reduce the training and inference gap, we propose a novel self-supervised diffusion model as the inpainting module. Given a single input image, we automatically construct a training pair of the masked occluded image and the ground-truth image with random cycle rendering. The constructed training samples are closely aligned to the testing instances, without the need for data annotation. To make full use of the masked images, we designed a Masked Enhanced Block (MEB), which can be easily plugged into the UNet and enhance the semantic conditions. Towards real-world animation, we present a novel task: out-animation, which extends the space and time of input objects. Extensive experiments on real datasets show that our method achieves competitive results with existing SOTA methods.
Computer Vision -> CV: Vision and language
List of keywords 
Computer Vision -> CV: Neural generative models, auto encoders, GANs   Computer Vision -> CV: Vision and language
658
Align, Perturb and Decouple: Toward Better Leverage of Difference Information for RSI Change Detection
[+] More 
[-] Less 
Change detection is a widely adopted technique in remote sense imagery (RSI) analysis in the discovery of long-term geomorphic evolution. To highlight the areas of semantic changes, previous effort mostly pays attention to learning representative feature descriptors of a single image, while the difference information is either modeled with simple difference operations or implicitly embedded via feature interactions. Nevertheless, such difference modeling can be noisy since it suffers from non-semantic changes and lacks explicit guidance from image content or context. In this paper, we revisit the importance of feature difference for change detection in RSI, and propose a series of operations to fully exploit the difference information: Alignment, Perturbation and Decoupling (APD). Firstly, alignment leverages contextual similarity to compensate for the non-semantic difference in feature space. Next, a difference module trained with semantic-wise perturbation is adopted to learn more generalized change estimators, which reversely bootstraps feature extraction and prediction. Finally, a decoupled dual-decoder structure is designed to predict semantic changes in both content-aware and content-agnostic manners. Extensive experiments are conducted on benchmarks of LEVIR-CD, WHU-CD and DSIFN-CD, demonstrating our proposed operations bring significant improvement and achieve competitive results under similar comparative conditions. Code is available at https://github.com/wangsp1999/CD-Research/tree/main/openAPD
List of keywords 
Computer Vision -> CV: Scene analysis and understanding    660
Semi-supervised Domain Adaptation via Prototype-based Multi-level Learning
[+] More 
[-] Less 
In semi-supervised domain adaptation (SSDA), a few labeled target samples of each class help the model to transfer knowledge representation from the fully labeled source domain to the target domain. Many existing methods ignore the benefits of making full use of the labeled target samples from multi-level. To make better use of this additional data, we propose a novel Prototype-based Multi-level Learning (ProML) framework to better tap the potential of labeled target samples. To achieve intra-domain adaptation, we first introduce a pseudo-label aggregation based on the intra-domain optimal transport to help the model align the feature distribution of unlabeled target samples and the prototype. At the inter-domain level, we propose a cross-domain alignment loss to help the model use the target prototype for cross-domain knowledge transfer. We further propose a dual consistency based on prototype similarity and linear classifier to promote discriminative learning of compact target feature representation at the batch level. Extensive experiments on three datasets, including DomainNet, VisDA2017, and Office-Home, demonstrate that our proposed method achieves state-of-the-art performance in SSDA. Our code is available at https://github.com/bupt-ai-cz/ProML.
Computer Vision -> CV: Recognition (object detection, categorization)
List of keywords 
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning    Computer Vision -> CV: Recognition (object detection, categorization)
663
Multi-level Graph Contrastive Prototypical Clustering
[+] More 
[-] Less 
Recently, graph neural networks (GNNs) have drawn a surge of investigations in deep graph clustering. Nevertheless, existing approaches predominantly are inclined to semantic-agnostic since GNNs exhibit inherent limitations in capturing global underlying semantic structures. Meanwhile, multiple objectives are imposed within one latent space, whereas representations from different granularities may presumably conflict with each other, yielding severe performance degradation for clustering. To this end, we propose a novel Multi-Level Graph Contrastive Prototypical Clustering (MLG-CPC) framework for end-to-end clustering. Specifically, a Prototype Discrimination (ProDisc) objective function is proposed to explicitly capture semantic information via cluster assignments. Moreover, to alleviate the issue of objectives conflict, we introduce to perceive representations of different granularities within individual feature-, prototypical-, and cluster-level spaces by the feature decorrelation, prototype contrast, and cluster space consistency respectively. Extensive experiments on four benchmarks demonstrate the superiority of the proposed MLG-CPC against the state-of-the-art graph clustering approaches.
Machine Learning -> ML: Clustering
Machine Learning -> ML: Multi-view learning
List of keywords 
Machine Learning -> ML: Unsupervised learning Machine Learning -> ML: Clustering
Machine Learning -> ML: Multi-view learning
668
Universal Adaptive Data Augmentation
[+] More 
[-] Less 
Existing automatic data augmentation (DA) methods either ignore updating DA’s parameters according to the target model’s state during training or adopt update strategies that are not effective enough. In this work, we design a novel data augmentation strategy called “Universal Adaptive Data Augmentation" (UADA). Different from existing methods, UADA would adaptively update DA’s parameters according to the target model’s gradient information during training: given a pre-defined set of DA operations, we randomly decide types and magnitudes of DA operations for every data batch during training, and adaptively update DA’s parameters along the gradient direction of the loss concerning DA’s parameters. In this way, UADA can increase the training loss of the target networks, and the target networks would learn features from harder samples to improve the generalization. Moreover, UADA is very general and can be utilized in numerous tasks, e.g., image classification, semantic segmentation and object detection. Extensive experiments with various models are conducted on CIFAR-10, CIFAR-100, ImageNet, tiny-ImageNet, Cityscapes, and VOC07+12 to prove the significant performance improvements brought by UADA.
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Segmentation
List of keywords 
Computer Vision -> CV: Scene analysis and understanding    Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Segmentation
669
Graph Propagation Transformer for Graph Representation Learning
[+] More 
[-] Less 
This paper presents a novel transformer architecture for graph representation learning. The core insight of our method is to fully consider the information propagation among nodes and edges in a graph when building the attention module in the transformer blocks. Specifically, we propose a new attention mechanism called Graph Propagation Attention (GPA). It explicitly passes the information among nodes and edges in three ways, i.e. node-to-node, node-to-edge, and edge-to-node, which is essential for learning graph-structured data. On this basis, we design an effective transformer architecture named Graph Propagation Transformer (GPTrans) to further help learn graph data. We verify the performance of GPTrans in a wide range of graph learning experiments on several benchmark datasets. These results show that our method outperforms many state-of-the-art transformer-based graph models with better performance. The code will be released at https://github.com/czczup/GPTrans.
Machine Learning -> ML: Attention models
Machine Learning -> ML: Sequence and graph learning
List of keywords 
Machine Learning -> ML: Applications Machine Learning -> ML: Attention models
Machine Learning -> ML: Sequence and graph learning
680
First-Choice Maximality Meets Ex-ante and Ex-post Fairness
[+] More 
[-] Less 
For the assignment problem where multiple indivisible items are allocated to a group of agents given their ordinal preferences, we design randomized mechanisms that satisfy first-choice maximality (FCM), i.e., maximizing the number of agents assigned their first choices, together with Pareto efficiency (PE). Our mechanisms also provide guarantees of ex-ante and ex-post fairness. The generalized eager Boston mechanism is ex-ante envy-free, and ex-post envy-free up to one item (EF1). The generalized probabilistic Boston mechanism is also ex-post EF1, and satisfies ex-ante efficiency instead of fairness. We also show that no strategyproof mechanism satisfies ex-post PE, EF1, and FCM simultaneously. In doing so, we expand the frontiers of simultaneously providing efficiency and both ex-ante and ex-post fairness guarantees for the assignment problem.
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Fair division
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Fair division
683
Non-Lambertian Multispectral Photometric Stereo via Spectral Reflectance Decomposition
[+] More 
[-] Less 
Multispectral photometric stereo (MPS) aims at recovering the surface normal of a scene from a single-shot multispectral image captured under multispectral illuminations. Existing MPS methods adopt the Lambertian reflectance model to make the problem tractable, but it greatly limits their application to real-world surfaces. In this paper, we propose a deep neural network named NeuralMPS to solve the MPS problem under non-Lambertian spectral reflectances. Specifically, we present a spectral reflectance decomposition model to disentangle the spectral reflectance into a geometric component and a spectral component. With this decomposition, we show that the MPS problem for surfaces with a uniform material is equivalent to the conventional photometric stereo (CPS) with unknown light intensities. In this way, NeuralMPS reduces the difficulty of the non-Lambertian MPS problem by leveraging the well-studied non-Lambertian CPS methods. Experiments on both synthetic and real-world scenes demonstrate the effectiveness of our method.
Computer Vision -> CV: 3D computer vision
List of keywords 
Computer Vision -> CV: Computational photography Computer Vision -> CV: 3D computer vision
684
Feature Staleness Aware Incremental Learning for CTR Prediction
[+] More 
[-] Less 
Click-through Rate (CTR) prediction in real-world recommender systems often deals with billions of user interactions every day. To improve the training efficiency, it is common to update the CTR prediction model incrementally using the new incremental data and a subset of historical data. However, the feature embeddings of a CTR prediction model often get stale when the corresponding features do not appear in current incremental data. In the next period, the model would have a performance degradation on samples containing stale features, which we call the feature staleness problem. To mitigate this problem, we propose a Feature Staleness Aware Incremental Learning method for CTR prediction (FeSAIL) which adaptively replays samples containing stale features. We first introduce a staleness-aware sampling algorithm (SAS) to sample a fixed number of stale samples with high sampling efficiency. We then introduce a staleness-aware regularization mechanism (SAR) for a fine-grained control of the feature embedding updating. We instantiate FeSAIL with a general deep learning-based CTR prediction model and the experimental results demonstrate FeSAIL outperforms various state-of-the-art methods on four benchmark datasets. The code can be found in https://github.com/cloudcatcher888/FeSAIL.
Machine Learning -> ML: Incremental learning
List of keywords 
Data Mining -> DM: Recommender systems Machine Learning -> ML: Incremental learning
694
CADParser: A Learning Approach of Sequence Modeling for B-Rep CAD
[+] More 
[-] Less 
Computer-Aided Design (CAD) plays a crucial role in industrial manufacturing by providing geometry information and the construction workflow for manufactured objects. The construction information enables effective re-editing of parametric CAD models. While boundary representation (B-Rep) is the standard format for representing geometry structures, JSON format is an alternative due to the lack of uniform criteria for storing the construction workflow. Regrettably, most CAD models available on the Internet only offer geometry information, omitting the construction procedure and hampering creation efficiency. This paper proposes a learning approach CADParser to infer the underlying modeling sequences given a B-Rep CAD model. It achieves this by treating the CAD geometry structure as a graph and the construction workflow as a sequence. Since the existing CAD dataset only contains two operations (i.e., Sketch and Extrusion), limiting the diversity of the CAD model creation, we also introduce a large-scale dataset incorporating a more comprehensive range of operations such as Revolution, Fillet, and Chamfer. Each model includes both the geometry structure and the construction sequences. Extensive experiments demonstrate that our method can compete with the existing state-of-the-art methods quantitatively and qualitatively. Data is available at https://drive.google.com/CADParserData.
Computer Vision -> CV: Applications
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Applications
704
FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning
[+] More 
[-] Less 
Large-scale neural networks possess considerable expressive power. They are well-suited for complex learning tasks in industrial applications. However, large-scale models pose significant challenges for training under the current Federated Learning (FL) paradigm. Existing approaches for efficient FL training often leverage model parameter dropout. However, manipulating individual model parameters is not only inefficient in meaningfully reducing the communication overhead when training  large-scale FL models, but may also be detrimental to the scaling efforts and model performance as shown by recent research. To address these issues, we propose the Federated Opportunistic Block Dropout (FedOBD) approach. The key novelty is that it decomposes large-scale models into semantic blocks so that FL participants can opportunistically upload quantized blocks, which are deemed to be significant towards training the model, to the FL server for aggregation. Extensive experiments evaluating FedOBD against four state-of-the-art approaches based on multiple real-world datasets show that it reduces the overall communication overhead by more than 88% compared to the best performing baseline approach, while achieving the highest test accuracy. To the best of our knowledge, FedOBD is the first approach to perform dropout on FL models at the block level rather than at the individual parameter level.
Machine Learning -> ML: Learning sparse models
Machine Learning -> ML: Optimization
List of keywords 
Machine Learning -> ML: Federated learning Machine Learning -> ML: Learning sparse models
Machine Learning -> ML: Optimization
705
Some Might Say All You Need Is Sum
[+] More 
[-] Less 
The expressivity of Graph Neural Networks (GNNs) is dependent on the aggregation functions they employ. Theoretical works have pointed towards Sum aggregation GNNs subsuming every other GNNs, while certain practical works have observed a clear advantage to using Mean and Max. An examination of the theoretical guarantee identifies two caveats. First, it is size-restricted, that is, the power of every specific GNN is limited to graphs of a specific size. Successfully processing larger graphs may require an other GNN, and so on. Second, it concerns the power to distinguish non-isomorphic graphs, not the power to approximate general functions on graphs, and the former does not necessarily imply the latter.
It is desired that a GNN’s usability will not be limited to graphs of any specific size. Therefore, we explore the realm of unrestricted-size expressivity. We prove that basic functions, which can be computed exactly by Mean or Max GNNs, are inapproximable by any Sum GNN. We prove that under certain restrictions, every Mean or Max GNN can be approximated by a Sum GNN, but even there, a combination of (Sum, [Mean/Max]) is more expressive than Sum alone. Lastly, we prove further expressivity limitations for GNNs with a broad class of aggregations.
Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Learning theory
List of keywords 
Machine Learning -> ML: Theory of deep learning Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Learning theory
721
Complexity of Efficient Outcomes in Binary-Action Polymatrix Games and Implications for Coordination Problems
[+] More 
[-] Less 
We investigate the difficulty of finding economically efficient solutions to coordination problems on graphs. Our work focuses on two forms of coordination problem: pure-coordination games and anti-coordination games. We consider three objectives in the context of simple binary-action polymatrix games: (i) maximizing welfare, (ii) maximizing potential, and (iii) finding a welfare-maximizing Nash equilibrium. We introduce an intermediate, new graph-partition problem, termed MWDP, which is of independent interest, and we provide a  complexity dichotomy for it. This dichotomy, among other results, provides as a corollary a dichotomy for Objective (i) for general binary-action polymatrix games. In addition, it reveals that the complexity of achieving these objectives varies depending on the form of the coordination problem. Specifically, Objectives (i) and (ii) can be efficiently solved in pure-coordination games, but are NP-hard in anti-coordination games. Finally, we show that objective (iii) is NP-hard even for simple non-trivial pure-coordination games.
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Noncooperative games Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
728
Dynamic Belief for Decentralized Multi-Agent Cooperative Learning
[+] More 
[-] Less 
Decentralized multi-agent cooperative learning is a practical task due to the partially observed setting both in training and execution. Every agent learns to cooperate without access to the observations and policies of others. However, the decentralized training of multi-agent is of great difficulty due to non-stationarity, especially when other agents’ policies are also in learning during training. To overcome this, we propose to learn a dynamic policy belief for each agent to predict the current policies of other agents and accordingly condition the policy of its own. To quickly adapt to the development of others’ policies, we introduce a historical context to learn the belief inference according to a few recent action histories of other agents and a latent variational inference to model their policies by a learned distribution. We evaluate our method on the StarCraft II micro management task (SMAC) and demonstrate its superior performance in the decentralized training settings and comparable results with the state-of-the-art CTDE methods.
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
729
The #DNN-Verification Problem: Counting Unsafe Inputs for Deep Neural Networks
[+] More 
[-] Less 
Deep Neural Networks are increasingly adopted in critical tasks that require a high level of safety, e.g., autonomous driving.
While state-of-the-art verifiers can be employed to check whether a DNN is unsafe w.r.t. some given property (i.e., whether there is at least one unsafe input configuration), their yes/no output is not informative enough for other purposes, such as shielding, model selection, or training improvements.
In this paper, we introduce the #DNN-Verification problem, which involves counting the number of input configurations of a DNN that result in a violation of a particular safety property. We analyze the complexity of this problem and propose a novel approach that returns the exact count of violations. Due to the #P-completeness of the problem, we also propose a randomized, approximate method that provides a provable probabilistic bound of the correct count while significantly reducing computational requirements. 
We present experimental results on a set of safety-critical benchmarks that demonstrate the effectiveness of our approximate method and evaluate the tightness of the bound.
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis AI Ethics, Trust, Fairness -> ETF: Safety and robustness
731
Generalization Guarantees of Self-Training of Halfspaces under Label Noise Corruption
[+] More 
[-] Less 
We investigate the generalization properties of a self-training algorithm with halfspaces. The approach learns a list of halfspaces iteratively from labeled and unlabeled training data, in which each iteration consists of two steps: exploration and pruning. In the exploration phase, the halfspace is found sequentially by maximizing the unsigned-margin among  unlabeled examples and then assigning pseudo-labels to those that have a distance higher than the current threshold. These pseudo-labels are allegedly corrupted by noise. The training set is then augmented with noisy pseudo-labeled examples, and a new classifier is trained.  This process is repeated until no more unlabeled examples remain for pseudo-labeling. In the pruning phase, pseudo-labeled samples that have a distance to the last halfspace greater than the associated  unsigned-margin are then discarded. We prove that the misclassification error of the resulting sequence of classifiers is bounded and show that the resulting semi-supervised approach never degrades performance compared to the  classifier learned using only the initial labeled training set. Experiments carried out on a variety of benchmarks demonstrate the efficiency of the proposed approach compared to state-of-the-art methods.
Machine Learning -> ML: Semi-supervised learning
List of keywords 
Machine Learning -> ML: Learning theory Machine Learning -> ML: Semi-supervised learning
736
Sequential Recommendation with Probabilistic Logical Reasoning
[+] More 
[-] Less 
Deep learning and symbolic learning are two frequently employed methods in Sequential Recommendation (SR). Recent neural-symbolic SR models demonstrate their potential to enable SR to be equipped with concurrent perception and cognition capacities. However, neural-symbolic SR remains a challenging problem due to open issues like representing users and items in logical reasoning. In this paper, we combine the Deep Neural Network (DNN) SR models with logical reasoning and propose a general framework named Sequential Recommendation with Probabilistic Logical Reasoning (short for SR-PLR). This framework allows SR-PLR to benefit from both similarity matching and logical reasoning by disentangling feature embedding and logic embedding in the DNN and probabilistic logic network. To better capture the uncertainty and evolution of user tastes, SR-PLR embeds users and items with a probabilistic method and conducts probabilistic logical reasoning on users’ interaction patterns.  Then the feature and logic representations learned from the DNN and logic network are concatenated to make the prediction. Finally, experiments on various sequential recommendation models demonstrate the effectiveness of the SR-PLR. Our code is available at https://github.com/Huanhuaneryuan/SR-PLR.
Data Mining -> DM: Collaborative filtering
List of keywords 
Data Mining -> DM: Recommender systems Data Mining -> DM: Collaborative filtering
747
Learning Few-shot Sample-set Operations for Noisy Multi-label Aspect Category Detection
[+] More 
[-] Less 
Multi-label Aspect Category Detection (MACD) is essential for aspect-based sentiment analysis, which aims to identify multiple aspect categories in a given sentence. Few-shot MACD is critical due to the scarcity of labeled data. However, MACD is a high-noise task, and existing methods fail to address it with only two or three training samples per class, which limits the application in practice. To solve above issues, we propose a group of Few-shot Sample-set Operations (FSO) to solve noisy MACD in fewer sample scenarios by identifying the semantic contents of samples. Learning interactions among intersection, subtraction, and union networks, the FSO imitates arithmetic operations on samples to distinguish relevant and irrelevant aspect contents. Eliminating the negative effect caused by noises, the FSO extracts discriminative prototypes and customizes a dedicated query vector for each class. Besides, we design a multi-label architecture, which integrates with score-wise loss and multi-label loss to optimize the FSO for multi-label prediction, avoiding complex threshold training or selection. Experiments show that our method achieves considerable performance. Significantly, it improves by 11.01% at most and an average of 8.59% Macro-F in fewer sample scenarios.
Machine Learning -> ML: Few-shot learning
Natural Language Processing -> NLP: Dialogue and interactive systems
List of keywords 
Natural Language Processing -> NLP: Sentiment analysis, stylistic analysis, and argument mining Machine Learning -> ML: Few-shot learning
Natural Language Processing -> NLP: Dialogue and interactive systems
752
Federated Probabilistic Preference Distribution Modelling with Compactness Co-Clustering for Privacy-Preserving Multi-Domain Recommendation
[+] More 
[-] Less 
With the development of modern internet techniques, Cross-Domain Recommendation (CDR) systems have been widely exploited for tackling the data-sparsity problem. Meanwhile most current CDR models assume that user-item interactions are accessible across different domains. However, such knowledge sharing process will break the privacy protection policy. In this paper, we focus on the  Privacy-Preserving Multi-Domain Recommendation problem (PPMDR). The problem is challenging since different domains are sparse and heterogeneous with the privacy protection. To tackle the above issues, we propose Federated Probabilistic Preference Distribution Modelling (FPPDM). FPPDM includes two main components, i.e., local domain modelling component and global server aggregation component with federated learning strategy. The local domain modelling component aims to exploit user/item preference distributions using the rating information in the corresponding domain. The global server aggregation component is set to combine user characteristics across domains. To better extract semantic neighbors information among the users, we further provide compactness co-clustering strategy in FPPDM ++ to cluster the users with similar characteristics. Our empirical studies on benchmark datasets demonstrate that FPPDM/ FPPDM ++ significantly outperforms the state-of-the-art models.
List of keywords 
Data Mining -> DM: Recommender systems 757
Mean Payoff Optimization for Systems of Periodic Service and Maintenance
[+] More 
[-] Less 
Consider oriented graph nodes requiring periodic visits by a service agent. The agent moves among the nodes and receives a payoff for each completed service task, depending on the time elapsed since the previous visit to a node. We consider the problem of finding a suitable schedule for the agent to maximize its long-run average payoff per time unit. We show that the problem of constructing an epsilon-optimal schedule is PSPACE-hard for every fixed non-negative epsilon, and that there exists an optimal periodic schedule of exponential length. We propose randomized finite-memory (RFM) schedules as a compact description of the agent’s strategies and design an efficient algorithm for constructing RFM schedules. Furthermore, we construct deterministic periodic schedules by sampling from RFM schedules.
Planning and Scheduling -> PS: Routing
Robotics -> ROB: Motion and path planning
List of keywords 
Planning and Scheduling -> PS: Robot planning Planning and Scheduling -> PS: Routing
Robotics -> ROB: Motion and path planning
774
RFENet: Towards Reciprocal Feature Evolution for Glass Segmentation
[+] More 
[-] Less 
Glass-like objects are widespread in daily life but remain intractable to be segmented for most existing methods. The transparent property makes it difficult to be distinguished from background, while the tiny separation boundary further impedes the acquisition of their exact contour. In this paper, by revealing the key co-evolution demand of semantic and boundary learning, we propose a Selective Mutual Evolution (SME) module to enable the reciprocal feature learning between them. Then to exploit the global shape context, we propose a Structurally Attentive Refinement (SAR) module to conduct a fine-grained feature refinement for those ambiguous points around the boundary. Finally, to further utilize the multi-scale representation, we integrate the above two modules into a cascaded structure and then introduce a Reciprocal Feature Evolution Network (RFENet) for effective glass-like object segmentation. Extensive experiments demonstrate that our RFENet achieves state-of-the-art performance on three popular public datasets. Code is available at https://github.com/VankouF/RFENet.
Computer Vision -> CV: Scene analysis and understanding
List of keywords 
Computer Vision -> CV: Segmentation Computer Vision -> CV: Scene analysis and understanding
792
Optimal Seat Arrangement: What Are the Hard and Easy Cases?
[+] More 
[-] Less 
We study four NP-hard optimal seat arrangement problems which each have as input a set of n agents, where each agent has cardinal preferences over other agents, and an n-vertex undirected graph (called the seat graph). The task is to assign each agent to a distinct vertex in the seat graph such that either the sum of utilities or the minimum utility is maximized, or it is envy-free or exchange-stable. Aiming at identifying hard and easy cases, we extensively study the algorithmic complexity of the four problems by looking into natural graph classes for the seat graph (e.g., paths, cycles, stars, or matchings), problem-specific parameters (e.g., the number of non-isolated vertices in the seat graph or the maximum number of agents towards whom an agent has non-zero preferences), and preference structures (e.g., non-negative or symmetric preferences). For strict preferences and seat graphs with disjoint edges and isolated vertices, we correct an error in the literature and show that finding an envy-free arrangement remains NP-hard in this case.
Game Theory and Economic Paradigms -> GTEP: Cooperative games
Game Theory and Economic Paradigms -> GTEP: Fair division
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice Game Theory and Economic Paradigms -> GTEP: Cooperative games
Game Theory and Economic Paradigms -> GTEP: Fair division
794
Realistic Cell Type Annotation and Discovery for Single-cell RNA-seq Data
[+] More 
[-] Less 
The rapid development of single-cell RNA sequencing (scRNA-seq) technologies allows us to explore tissue heterogeneity at the cellular level. Cell type annotation plays an essential role in the substantial downstream analysis of scRNA-seq data. Existing methods usually classify the novel cell types in target data as an “unassigned” group and rarely discover the fine-grained cell type structure among them. Besides, these methods carry risks, such as susceptibility to batch effect between reference and target data, thus further compromising of inherent discrimination of target data. Considering these limitations, here we propose a new and practical task called realistic cell type annotation and discovery for scRNA-seq data. In this task, cells from seen cell types are given class labels, while cells from novel cell types are given cluster labels. To tackle this problem, we propose an end-to-end algorithm framework called scPOT from the perspective of optimal transport (OT). Specifically,  we first design an OT-based prototypical representation learning paradigm to encourage both global discriminations of clusters and local consistency of cells to uncover the intrinsic structure of target data. Then we propose an unbalanced OT-based partial alignment strategy with statistical filling to detect the cells from the seen cell types across reference and target data. Notably, scPOT also introduces an easy yet effective solution to automatically estimate the overall cell type number in target data. Extensive results on our carefully designed evaluation benchmarks demonstrate the superiority of scPOT over various state-of-the-art clustering and annotation methods.
Machine Learning -> ML: Applications
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Bioinformatics Machine Learning -> ML: Applications
795
FGNet: Towards Filling the Intra-class and Inter-class Gaps for Few-shot Segmentation
[+] More 
[-] Less 
Current few-shot segmentation (FSS) approaches have made tremendous achievements based on prototypical learning techniques. However, due to the scarcity of the support data provided, FSS methods still suffer from the intra-class and inter-class gaps. In this paper, we propose a uniform network to fill both the gaps, termed FGNet. It consists of the novel design of a Self-Adaptive Module (SAM) to emphasize the query feature to generate an enhanced prototype for self-alignment. Such a prototype caters to each query sample itself since it contains the underlying intra-instance information, which gets around the intra-class appearance gap. Moreover, we design an Inter-class Feature Separation Module (IFSM) to separate the feature space of the target class from other classes, which contributes to bridging the inter-class gap. In addition, we present several new losses and a method termed B-SLIC, which help to further enhance the separation performance of FGNet. Experimental results show that FGNet reduces both the gaps for FSS by SAM and IFSM respectively, and achieves state-of-the-art performances on both PASCAL-5i and COCO-20i datasets compared with previous top-performing approaches.
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
List of keywords 
Computer Vision -> CV: Segmentation Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
797
Leveraging Argumentation for Generating Robust Sample-based Explanations
[+] More 
[-] Less 
Explaining predictions made by inductive classifiers has become crucial with the rise of complex models acting more and more as black-boxes. 
Abductive explanations are one of the most popular types of explanations that are provided for the purpose. They highlight feature-values that  
are sufficient for making predictions. In the literature, they are generated by exploring the whole feature space, which is unreasonable in practice. 
This paper solves the problem by introducing explanation functions that generate abductive explanations from a sample of instances. It shows 
that such functions should be defined with great care since they cannot satisfy two desirable properties at the same time, namely existence of 
explanations for every individual decision (success) and correctness of explanations (coherence). The paper provides a parameterized family of 
argumentation-based explanation functions, each of which satisfies one of the two properties. It studies their formal properties and their experimental 
behaviour on different datasets.
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
List of keywords 
Knowledge Representation and Reasoning -> KRR: Argumentation AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
800
Parametrized Gradual Semantics Dealing with Varied Degrees of Compensation
[+] More 
[-] Less 
Compensation is a strategy that a semantics may follow when it faces  
dilemmas between quality and quantity of attackers. It allows several weak 
attacks to compensate one strong attack. It is  based on compensation degree, 
which is a tuple that indicates (i) to what extent an attack is weak and (ii) the 
number of weak attacks needed to compensate a strong one. 
Existing principles on compensation do not specify the parameters, thus it is unclear 
whether semantics satisfying them compensate at only one degree or several degrees, and which ones.
This paper proposes a parameterised family of gradual semantics, which 
unifies multiple semantics that share some principles but differ in their 
strategy regarding solving dilemmas. Indeed, we show that the two semantics taking 
the extreme values of the parameter favour respectively  quantity and quality,  while all 
the remaining ones compensate at some degree. We define three classes of compensation 
degrees and show that the novel family is able to compensate at all of them while 
none of the existing gradual semantics does.
List of keywords 
Knowledge Representation and Reasoning -> KRR: Argumentation 809
Solving Quantum-Inspired  Perfect Matching Problems via Tutte-Theorem-Based Hybrid Boolean  Constraints
[+] More 
[-] Less 
Determining the satisfiability of Boolean constraint-satisfaction problems with different types of constraints, that is hybrid constraints,  is a well-studied problem with important applications. We study a new application of hybrid Boolean constraints, which arises in quantum computing. The problem relates to constrained perfect matching in edge-colored graphs. While general-purpose hybrid constraint solvers can be powerful, we show that direct encodings of the constrained-matching problem as hybrid constraints scale poorly and special techniques are still needed. We propose a novel encoding based on Tutte’s Theorem in graph theory as well as optimization techniques. Empirical results demonstrate that our encoding, in suitable languages with advanced SAT solvers, scales significantly better than a number of competing approaches on constrained-matching benchmarks. Our study identifies the necessity of designing problem-specific encodings when applying powerful general-purpose constraint solvers.
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Modeling
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Modeling
827
Learning Monocular Depth in Dynamic Environment via Context-aware Temporal Attention
[+] More 
[-] Less 
The monocular depth estimation task has recently revealed encouraging prospects, especially for the autonomous driving task. To tackle the ill-posed problem of 3D geometric reasoning from 2D monocular images, multi-frame monocular methods are developed to leverage the perspective correlation information from sequential temporal frames. However, moving objects such as cars and trains usually violate the static scene assumption, leading to feature inconsistency deviation and misaligned cost values, which would mislead the optimization algorithm. In this work, we present CTA-Depth, a Context-aware Temporal Attention guided network for multi-frame monocular Depth estimation. Specifically, we first apply a multi-level attention enhancement module to integrate multi-level image features to obtain an initial depth and pose estimation. Then the proposed CTA-Refiner is adopted to alternatively optimize the depth and pose. During the CTA-Refiner process, context-aware temporal attention (CTA) is developed to capture the global temporal-context correlations to maintain the feature consistency and estimation integrity of moving objects. In particular, we propose a long-range geometry embedding (LGE) module to produce a long-range temporal geometry prior. Our approach achieves significant improvements (e.g., 13.5% for the Abs Rel metric on the KITTI dataset) over state-of-the-art approaches on three benchmark datasets.
Computer Vision -> CV: Scene analysis and understanding
Machine Learning -> ML: Attention models
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Scene analysis and understanding
Machine Learning -> ML: Attention models
830
InitLight: Initial Model Generation for Traffic Signal Control Using Adversarial Inverse Reinforcement Learning
[+] More 
[-] Less 
Due to repetitive trial-and-error style interactions between agents and a fixed traffic environment during the policy learning, existing Reinforcement Learning (RL)-based Traffic Signal Control (TSC) methods greatly suffer from long RL training time and poor adaptability of RL agents to other complex traffic environments. To address these problems, we propose a novel Adversarial Inverse Reinforcement Learning (AIRL)-based pre-training method named InitLight, which enables effective initial model generation for TSC agents. Unlike traditional RL-based TSC approaches that train a large number of agents simultaneously for a specific multi-intersection environment, InitLight pre-trains only one single initial model based on multiple single-intersection environments together with their expert trajectories. Since the reward function learned by InitLight can recover ground-truth TSC rewards for different intersections at optimality, the pre-trained agent can be deployed at intersections of any traffic environments as initial models to accelerate subsequent overall global RL training. Comprehensive experimental results show that, the initial model generated by InitLight can not only significantly accelerate the convergence with much fewer episodes, but also own superior generalization ability to accommodate various kinds of complex traffic environments.
Machine Learning -> ML: Deep reinforcement learning
Multidisciplinary Topics and Applications -> MDA: Sensor networks and smart cities
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Transportation Machine Learning -> ML: Deep reinforcement learning
Multidisciplinary Topics and Applications -> MDA: Sensor networks and smart cities
847
Efficient Multi-View Inverse Rendering Using a Hybrid Differentiable Rendering Method
[+] More 
[-] Less 
Recovering the shape and appearance of real-world objects from natural 2D images is a long-standing and challenging inverse rendering problem. In this paper, we introduce a novel hybrid differentiable rendering method to efficiently reconstruct the 3D geometry and reflectance of a scene from multi-view images captured by conventional hand-held cameras. Our method follows an analysis-by-synthesis approach and consists of two phases. In the initialization phase, we use traditional SfM and MVS methods to reconstruct a virtual scene roughly matching the real scene. Then in the optimization phase, we adopt a hybrid approach to refine the geometry and reflectance, where the geometry is first optimized using an approximate differentiable rendering method, and the reflectance is optimized afterward using a physically-based differentiable rendering method. Our hybrid approach combines the efficiency of approximate methods with the high-quality results of physically-based methods. Extensive experiments on synthetic and real data demonstrate that our method can produce reconstructions with similar or higher quality than state-of-the-art methods while being more efficient.
Computer Vision -> CV: Applications
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Applications
848
Label Enhancement via Joint Implicit Representation Clustering
[+] More 
[-] Less 
Label distribution is an effective label form to portray label polysemy (i.e., the cases that an instance can be described by multiple labels simultaneously). However, the expensive annotating cost of label distributions limits its application to a wider range of practical tasks. Therefore, LE (label enhancement) techniques are extensively studied to solve this problem. Existing LE algorithms mostly estimate label distributions by the instance relation or the label relation. However, they suffer from biased instance relations, limited model capabilities, or suboptimal local label correlations. Therefore, in this paper, we propose a deep generative model called JRC to simultaneously learn and cluster the joint implicit representations of both features and labels, which can be used to improve any existing LE algorithm involving the instance relation or local label correlations. Besides, we develop a novel label distribution recovery module, and then integrate it with JRC model, thus constituting a novel generative label enhancement model that utilizes the learned joint implicit representations and instance clusters in a principled way. Finally, extensive experiments validate our proposal.
Machine Learning -> ML: Unsupervised learning
Machine Learning -> ML: Weakly supervised learning
List of keywords 
Machine Learning -> ML: Multi-label Machine Learning -> ML: Unsupervised learning
Machine Learning -> ML: Weakly supervised learning
856
Graph Sampling-based Meta-Learning for Molecular Property Prediction
[+] More 
[-] Less 
Molecular property is usually observed with a limited number of samples, and researchers have considered property prediction as a few-shot problem. One important fact that has been ignored by prior works is that each molecule can be recorded with several different properties simultaneously. To effectively utilize many-to-many correlations of molecules and properties, we propose a Graph Sampling-based Meta-learning (GS-Meta) framework for few-shot molecular property prediction. First, we construct a Molecule-Property relation Graph (MPG): molecule and properties are nodes, while property labels decide edges. Then, to utilize the topological information of MPG,  we reformulate an episode in meta-learning as a subgraph of the MPG, containing a target property node, molecule nodes, and auxiliary property nodes. Third, as episodes in the form of subgraphs are no longer independent of each other, we propose to schedule the subgraph sampling process with a contrastive loss function, which considers the consistency and discrimination of subgraphs. Extensive experiments on 5 commonly-used benchmarks show GS-Meta consistently outperforms state-of-the-art methods by  5.71%-6.93% in ROC-AUC and verify the effectiveness of each proposed module. Our code is available at https://github.com/HICAI-ZJU/GS-Meta.
Machine Learning -> ML: Few-shot learning
Multidisciplinary Topics and Applications -> MDA: Bioinformatics
List of keywords 
Machine Learning -> ML: Meta-learning Machine Learning -> ML: Few-shot learning
Multidisciplinary Topics and Applications -> MDA: Bioinformatics
863
TG-VQA: Ternary Game of Video Question Answering
[+] More 
[-] Less 
Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them. However, since relying heavily on human instructions, i.e., annotations or priors, current contrastive learning-based VideoQA methods remains challenging to perform fine-grained visual-linguistic alignments. In this work, we innovatively resort to game theory, which can simulate complicated relationships among multiple players with specific interaction strategies, e.g., video, question, and answer as ternary players, to achieve fine-grained alignment for VideoQA task. Specifically, we carefully design a VideoQA-specific interaction strategy to tailor the characteristics of VideoQA, which can mathematically generate the fine-grained visual-linguistic alignment label without label-intensive efforts. Our TG-VQA outperforms existing state-of-the-art by a large margin (more than 5%) on long-term and short-term VideoQA datasets, verifying its effectiveness and generalization ability. Thanks to the guidance of game-theoretic interaction, our model impressively convergences well on limited data (10^4 videos), surpassing most of those pre-trained on large-scale data (10^7 videos).
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Video analysis and understanding
List of keywords 
Computer Vision -> CV: Visual reasoning and symbolic representation Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Video analysis and understanding
879
Data Level Lottery Ticket Hypothesis for Vision Transformers
[+] More 
[-] Less 
The conventional lottery ticket hypothesis (LTH) claims that there exists a sparse subnetwork within a dense neural network and a proper random initialization method, called the winning ticket, such that it can be trained from scratch to almost as good as the dense counterpart. Meanwhile, the research of LTH in vision transformers (ViTs) is scarcely evaluated. In this paper, we first show that the conventional winning ticket is hard to find at weight level of ViTs by existing methods. Then, we generalize the LTH for ViTs to input data consisting of image patches inspired by the input dependence of ViTs. That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches. We call this subset of input patches the winning tickets, which represent a significant amount of information in the input data. We use a ticket selector to generate the winning tickets based on the informativeness of patches for various types of ViT, including DeiT, LV-ViT, and Swin Transformers. The experiments show that there is a clear difference between the performance of models trained with winning tickets and randomly selected subsets, which verifies our proposed theory. We elaborate the analogical similarity between our proposed Data-LTH-ViTs and the conventional LTH for further verifying the integrity of our theory. The Source codes are available at https://github.com/shawnricecake/vit-lottery-ticket-input.
Machine Learning -> ML: Theory of deep learning
List of keywords 
Computer Vision -> CV: Machine learning for vision Machine Learning -> ML: Theory of deep learning
880
A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram
[+] More 
[-] Less 
Geometry problem solving (GPS) is a high-level mathematical reasoning requiring the capacities of multi-modal fusion and geometric knowledge application. Recently, neural solvers have shown great potential in GPS but still be short in diagram presentation and modal fusion. In this work, we convert diagrams into basic textual clauses to describe diagram features effectively, and propose a new neural solver called PGPSNet to fuse multi-modal information efficiently. Combining structural and semantic pre-training, data augmentation and self-limited decoding, PGPSNet is endowed with rich knowledge of geometry theorems and geometric representation, and therefore promotes geometric understanding and reasoning. In addition, to facilitate the research of GPS, we build a new large-scale and fine-annotated GPS dataset named PGPS9K, labeled with both fine-grained diagram annotation and interpretable solution program. Experiments on PGPS9K and an existing dataset Geometry3K validate the superiority of our method over the state-of-the-art neural solvers. Our code, dataset and appendix material are available at \url{https://github.com/mingliangzhang2018/PGPS}.
Machine Learning -> ML: Multi-modal learning
Multidisciplinary Topics and Applications -> MDA: Education
List of keywords 
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning Machine Learning -> ML: Multi-modal learning
Multidisciplinary Topics and Applications -> MDA: Education
882
Incomplete Multi-view Clustering via Prototype-based Imputation
[+] More 
[-] Less 
In this paper, we study how to achieve two characteristics highly-expected by incomplete multi-view clustering (IMvC). Namely, i) instance commonality refers to that within-cluster instances should share a common pattern, and ii) view versatility refers to that cross-view samples should own view-specific patterns. To this end, we design a novel dual-stream model which employs a dual attention layer and a dual contrastive learning loss to learn view-specific prototypes and model the sample-prototype relationship. When the view is missed, our model performs data recovery using the prototypes in the missing view and the sample-prototype relationship inherited from the observed view. Thanks to our dual-stream model, both cluster- and view-specific information could be captured, and thus the instance commonality and view versatility could be preserved to facilitate IMvC. Extensive experiments demonstrate the superiority of our method on five challenging benchmarks compared with 11 approaches. The code could be accessed from https://pengxi.me.
Machine Learning -> ML: Clustering
List of keywords 
Machine Learning -> ML: Multi-view learning Machine Learning -> ML: Clustering
892
WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation
[+] More 
[-] Less 
The top-down and bottom-up methods are two mainstreams of referring segmentation, while both methods have their own intrinsic weaknesses. Top-down methods are chiefly disturbed by Polar Negative (PN) errors owing to the lack of fine-grained cross-modal alignment. Bottom-up methods are mainly perturbed by Inferior Positive (IP) errors due to the lack of prior object information. Nevertheless, we discover that two types of methods are highly complementary for restraining respective weaknesses but the direct average combination leads to harmful interference. In this context, we build Win-win Cooperation (WiCo) to exploit complementary nature of two types of methods on both interaction and integration aspects for achieving a win-win improvement. For the interaction aspect, Complementary Feature Interaction (CFI) introduces prior object information to bottom-up branch and provides fine-grained information to top-down branch for complementary feature enhancement. For the integration aspect, Gaussian Scoring Integration (GSI) models the gaussian performance distributions of two branches and weighted integrates results by sampling confident scores from the distributions. With our WiCo, several prominent bottom-up and top-down combinations achieve remarkable improvements on three common datasets with reasonable extra costs, which justifies effectiveness and generality of our method.
Computer Vision -> CV: Segmentation
List of keywords 
Computer Vision -> CV: Vision and language  Computer Vision -> CV: Segmentation
902
Cross-Domain Facial Expression Recognition via Disentangling Identity Representation
[+] More 
[-] Less 
Most existing cross-domain facial expression recognition (FER) works require target domain data to assist the model in analyzing distribution shifts to overcome negative effects. However, it is often hard to obtain expression images of the target domain in practical applications. Moreover, existing methods suffer from the interference of identity information, thus limiting the discriminative ability of the expression features. We exploit the idea of domain generalization (DG) and propose a representation disentanglement model to address the above problems. Specifically, we learn three independent potential subspaces corresponding to the domain, expression, and identity information from facial images. Meanwhile, the extracted expression and identity features are recovered as Fourier phase information reconstructed images, thereby ensuring that the high-level semantics of images remain unchanged after disentangling the domain information. Our proposed method can disentangle expression features from expression-irrelevant ones (i.e., identity and domain features). Therefore, the learned expression features exhibit sufficient domain invariance and discriminative ability. We conduct experiments with different settings on multiple benchmark datasets, and the results show that our method achieves superior performance compared with state-of-the-art methods.
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Representation learning
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Representation learning
906
A Unification Framework for Euclidean and Hyperbolic Graph Neural Networks
[+] More 
[-] Less 
Hyperbolic neural networks can effectively capture the inherent hierarchy of graph datasets, and consequently a powerful choice of GNNs. However, they entangle multiple incongruent (gyro-)vector spaces within a layer, which makes them limited in terms of generalization and scalability. 
In this work, we propose the Poincaré disk model as our search space, and apply all approximations on the disk (as if the disk is a tangent space derived from the origin), thus getting rid of all inter-space transformations. Such an approach enables us to propose a hyperbolic normalization layer and to further simplify the entire hyperbolic model to a Euclidean model cascaded with our hyperbolic normalization layer. We applied our proposed nonlinear hyperbolic normalization to the current state-of-the-art homogeneous and multi-relational graph networks. We demonstrate that our model not only leverages the power of Euclidean networks such as interpretability and efficient execution of various model components, but also outperforms both Euclidean and hyperbolic counterparts on various benchmarks. Our code is made publicly available at https://github.com/oom-debugger/ijcai23.
Machine Learning -> ML: Representation learning
List of keywords 
Machine Learning -> ML: Sequence and graph learning Machine Learning -> ML: Representation learning
920
On Efficient Transformer-Based Image Pre-training for Low-Level Vision
[+] More 
[-] Less 
Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems. In this paper, we tailor transformer-based pre-training regimes that boost various low-level tasks. To comprehensively diagnose the influence of pre-training, we design a whole set of principled evaluation tools that uncover its effects on internal representations. The observations demonstrate that pre-training plays strikingly different roles in low-level tasks. For example, pre-training introduces more local information to intermediate layers in super-resolution (SR), yielding significant performance gains, while pre-training hardly affects internal feature representations in denoising, resulting in limited gains. Further, we explore different methods of pre-training, revealing that multi-related-task pre-training is more effective and data-efficient than other alternatives. Finally, we extend our study to varying data scales and model sizes, as well as comparisons between transformers and CNNs. Based on the study, we successfully develop state-of-the-art models for multiple low-level tasks.
Computer Vision -> CV: Applications
Computer Vision -> CV: Representation learning
List of keywords 
Computer Vision -> CV: Computational photography Computer Vision -> CV: Applications
Computer Vision -> CV: Representation learning
924
STS-GAN: Can We Synthesize Solid Texture with High Fidelity from Arbitrary 2D Exemplar?
[+] More 
[-] Less 
Solid texture synthesis (STS), an effective way to extend a 2D exemplar to a 3D solid volume, exhibits advantages in computational photography. However, existing methods generally fail to accurately learn arbitrary textures, which may result in the failure to synthesize solid textures with high fidelity. In this paper, we propose a novel generative adversarial nets-based framework (STS-GAN) to extend the given 2D exemplar to arbitrary 3D solid textures. In STS-GAN, multi-scale 2D texture discriminators evaluate the similarity between the given 2D exemplar and slices from the generated 3D texture, promoting the 3D texture generator synthesizing realistic solid textures. Finally, experiments demonstrate that the proposed method can generate high-fidelity solid textures with similar visual characteristics to the 2D exemplar.
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Neural generative models, auto encoders, GANs
List of keywords 
Computer Vision -> CV: Computational photography Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Neural generative models, auto encoders, GANs
928
PPAT: Progressive Graph Pairwise Attention Network for Event Causality Identification
[+] More 
[-] Less 
Event Causality Identification (ECI) aims to identify the causality between a pair of event mentions in a document, which is composed of sentence-level ECI (SECI) and document-level ECI (DECI). Previous work applies various reasoning models to identify the implicit event causality. However, they indiscriminately reason all event causality in the same way, ignoring that most inter-sentence event causality depends on intra-sentence event causality to infer.  In this paper, we propose a progressive graph pairwise attention network (PPAT) to consider the above dependence. PPAT applies a progressive reasoning strategy, as it first predicts the intra-sentence event causality, and then infers the more implicit inter-sentence event causality based on the SECI result. We construct a sentence boundary event relational graph, and PPAT leverages a simple pairwise attention mechanism, which attends to different reasoning chains on the graph. In addition, we propose a causality-guided training strategy for assisting PPAT in learning causality-related representations on every layer. Extensive experiments show that our model achieves state-of-the-art performance on three benchmark datasets (5.5%, 2.2% and 4.5% F1 gains on EventStoryLine, MAVEN-ERE and Causal-TimeBank). Code is available at https://github.com/HITsz-TMG/PPAT.
Natural Language Processing -> NLP: Information extraction
List of keywords 
Natural Language Processing -> NLP: Applications Natural Language Processing -> NLP: Information extraction
930
A Low Latency Adaptive Coding Spike Framework for Deep Reinforcement Learning
[+] More 
[-] Less 
In recent years, spiking neural networks (SNNs) have been used in reinforcement learning (RL) due to their low power consumption and event-driven features. However, spiking reinforcement learning (SRL), which suffers from fixed coding methods, still faces the problems of high latency and poor versatility. In this paper, we use learnable matrix multiplication to encode and decode spikes, improving the flexibility of the coders and thus reducing latency. Meanwhile, we train the SNNs using the direct training method and use two different structures for online and offline RL algorithms, which gives our model a wider range of applications. Extensive experiments have revealed that our method achieves optimal performance with ultra-low latency (as low as 0.8% of other SRL methods) and excellent energy efficiency (up to 5X the DNNs) in different algorithms and different environments.
Machine Learning -> ML: Deep reinforcement learning
Robotics -> ROB: Cognitive robotics
List of keywords 
Humans and AI -> HAI: Cognitive modeling Machine Learning -> ML: Deep reinforcement learning
Robotics -> ROB: Cognitive robotics
952
Sph2Pob: Boosting Object Detection on Spherical Images with Planar Oriented Boxes Methods
[+] More 
[-] Less 
Object detection on panoramic/spherical images has been developed rapidly in the past few years, where IoU-calculator is a fundamental part of various detector components, i.e. Label Assignment, Loss and NMS. Due to the low efficiency and non-differentiability of spherical Unbiased IoU, spherical approximate IoU methods have been proposed recently. We find that the key of these approximate methods is to map spherical boxes to planar boxes. However, there exists two problems in these methods: (1) they do not eliminate the influence of panoramic image distortion; (2) they break the original pose between bounding boxes. They lead to the low accuracy of these methods. Taking the two problems into account, we propose a new sphere-plane boxes transform, called Sph2Pob. Based on the Sph2Pob, we propose (1) an differentiable IoU, Sph2Pob-IoU, for spherical boxes with low time-cost and high accuracy and (2) an agent Loss, Sph2Pob-Loss, for spherical detection with high flexibility and expansibility. Extensive experiments verify the effectiveness and generality of our approaches, and Sph2Pob-IoU and Sph2Pob-Loss together boost the performance of spherical detectors. The source code is available at https://github.com/AntXinyuan/sph2pob.
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Scene analysis and understanding
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Scene analysis and understanding
953
APR: Online Distant Point Cloud Registration through Aggregated Point Cloud Reconstruction
[+] More 
[-] Less 
For many driving safety applications, it is of great importance to accurately register LiDAR point clouds generated on distant moving vehicles. However, such point clouds have extremely different point density and sensor perspective on the same object, making registration on such point clouds very hard. In this paper, we propose a novel feature extraction framework, called APR, for online distant point cloud registration. Specifically, APR leverages an autoencoder design, where the autoencoder reconstructs a denser aggregated point cloud with several frames instead of the original single input point cloud. Our design forces the encoder to extract features with rich local geometry information based on one single input point cloud. Such features are then used for online distant point cloud registration. We conduct extensive experiments against state-of-the-art (SOTA) feature extractors on KITTI and nuScenes datasets. Results show that APR outperforms all other extractors by a large margin, increasing average registration recall of SOTA extractors by 7.1% on LoKITTI and 4.6% on LoNuScenes. Code is available at https://github.com/liuQuan98/APR.
Computer Vision -> CV: Machine learning for vision
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Machine learning for vision
962
Quick Multi-Robot Motion Planning by Combining Sampling and Search
[+] More 
[-] Less 
We propose a novel algorithm to solve multi-robot motion planning (MRMP) rapidly, called Simultaneous Sampling-and-Search Planning (SSSP). Conventional MRMP studies mostly take the form of two-phase planning that constructs roadmaps and then finds inter-robot collision-free paths on those roadmaps. In contrast, SSSP simultaneously performs roadmap construction and collision-free pathfinding. This is realized by uniting techniques of single-robot sampling-based motion planning and search techniques of multi-agent pathfinding on discretized spaces. Doing so builds the small search space, leading to quick MRMP. SSSP ensures finding a solution eventually if exists. Our empirical evaluations in various scenarios demonstrate that SSSP significantly outperforms standard approaches to MRMP, i.e., solving more problem instances much faster. We also applied SSSP to planning for 32 ground robots in a dense situation.
Planning and Scheduling -> PS: Distributed and multi-agent planning
Robotics -> ROB: Motion and path planning
Robotics -> ROB: Multi-robot systems
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning Planning and Scheduling -> PS: Distributed and multi-agent planning
Robotics -> ROB: Motion and path planning
Robotics -> ROB: Multi-robot systems
974
DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards
[+] More 
[-] Less 
Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness is a deciding factor in the performance of RL algorithms, especially when facing sparse extrinsic rewards. Recent studies have shown the effectiveness of encouraging exploration with intrinsic rewards estimated from novelties in observations. However, there is a gap between the novelty of an observation and an exploration, as both the stochasticity in the environment and the agent’s behavior may affect the observation. To evaluate exploratory behaviors accurately, we propose DEIR, a novel method in which we theoretically derive an intrinsic reward with a conditional mutual information term that principally scales with the novelty contributed by agent explorations, and then implement the reward with a discriminative forward model. Extensive experiments on both standard and advanced exploration tasks in MiniGrid show that DEIR quickly learns a better policy than the baselines. Our evaluations on ProcGen demonstrate both the generalization capability and the general applicability of our intrinsic reward.
List of keywords 
Machine Learning -> ML: Reinforcement learning 978
Helpful Information Sharing for Partially Informed Planning Agents
[+] More 
[-] Less 
In many real-world settings, an autonomous agent may not have sufficient information or sensory capabilities to accomplish its goals, even when they are achievable. In some cases, the needed information can be provided by another agent, but information sharing might be costly due to limited communication bandwidth and other constraints. We address the problem of Helpful Information Sharing (HIS), which focuses on selecting minimal information to reveal to a partially informed agent in order to guarantee it can achieve its goal. We offer a novel compilation of HIS to a classical planning problem, which can be solved efficiently by any off-the-shelf planner. We provide guarantees of optimality for our approach and describe its extensions to maximize robustness and support settings in which the agent needs to decide which sensors to deploy in the environment. We demonstrate the power of our approaches on a set of standard benchmarks as well as on a novel benchmark.
Planning and Scheduling -> PS: Model-based reasoning
List of keywords 
Planning and Scheduling -> PS: Planning with Incomplete Information Planning and Scheduling -> PS: Model-based reasoning
981
Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation
[+] More 
[-] Less 
Audio visual segmentation (AVS) aims to segment the sounding objects for each frame of a given video. To distinguish the sounding objects from silent ones, both audio-visual semantic correspondence and temporal interaction are required. The previous method applies multi-frame cross-modal attention to conduct pixel-level interactions between audio features and visual features of multiple frames simultaneously, which is both redundant and implicit. In this paper, we propose an Audio-Queried Transformer architecture, AQFormer, where we define a set of object queries conditioned on audio information and associate each of them to particular sounding objects. Explicit object-level semantic correspondence between audio and visual modalities is established by gathering object information from visual features with predefined audio queries. Besides, an Audio-Bridged Temporal Interaction module is proposed to exchange sounding object-relevant information among multiple frames with the bridge of audio features. Extensive experiments are conducted on two AVS benchmarks to show that our method achieves state-of-the-art performances, especially 7.1% M_J and 7.6% M_F gains on the MS3 setting.
Computer Vision -> CV: Video analysis and understanding
List of keywords 
Computer Vision -> CV: Segmentation Computer Vision -> CV: Video analysis and understanding
990
ProMix: Combating Label Noise via Maximizing Clean Sample Utility
[+] More 
[-] Less 
Learning with Noisy Labels (LNL) has become an appealing topic, as imperfectly annotated data are relatively cheaper to obtain. Recent state-of-the-art approaches employ specific selection mechanisms to separate clean and noisy samples and then apply Semi-Supervised Learning (SSL) techniques for improved performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. To fulfill this, we propose a novel LNL framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high confidence selection technique that selects those examples with high confidence scores and matched predictions with given labels to dynamically expand a base clean sample set. To overcome the potential side effect of excessive clean set selection procedure, we further devise a novel SSL framework that is able to train balanced and unbiased classifiers on the separated clean and noisy samples. Extensive experiments demonstrate that ProMix significantly advances the current state-of-the-art results on multiple benchmarks with different types and levels of noise. It achieves an average improvement of 2.48% on the CIFAR-N dataset.
List of keywords 
Machine Learning -> ML: Weakly supervised learning 997
Outsourcing Adjudication to Strategic Jurors
[+] More 
[-] Less 
We study a scenario where an adjudication task (e.g., the resolution of a binary dispute) is outsourced to a set of agents who are appointed as jurors. This scenario is particularly relevant in a Web3 environment, where no verification of the adjudication outcome is possible, and the appointed agents are, in principle, indifferent to the final verdict. We consider simple adjudication mechanisms that use (1) majority voting to decide the final verdict and (2) a payment function to reward the agents with the majority vote and possibly punish the ones in the minority. Agents interact with such a mechanism strategically: they exert some effort to understand how to properly judge the dispute and cast a yes/no vote that depends on this understanding and on information they have about the rest of the votes. Eventually, they vote so that their utility (i.e., their payment from the mechanism minus the cost due to their effort) is maximized. Under reasonable assumptions about how an agent’s effort is related to her understanding of the dispute, we show that appropriate payment functions can be used to recover the correct adjudication outcome with high probability. Our findings follow from a detailed analysis of the induced strategic game and make use of both theoretical arguments and simulation experiments.
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
1003
New Fairness Concepts for Allocating Indivisible Items
[+] More 
[-] Less 
For the fundamental problem of fairly dividing a set of indivisible items among agents, envy-freeness up to any item (EFX) and maximin fairness (MMS) are arguably the most compelling fairness concepts proposed till now.  Unfortunately, despite significant efforts over the past few years, whether EFX allocations always exist is still an enigmatic open problem, let alone their efficient computation. Furthermore, today we know that MMS allocations are not always guaranteed to exist. These facts weaken the usefulness of both EFX and MMS, albeit their appealing conceptual characteristics. 
We propose two alternative fairness concepts—called epistemic EFX (EEFX) and minimum EFX value fairness (MXS)—inspired by EFX and MMS. For both, we explore their relationships to well-studied fairness notions and, more importantly, prove that EEFX and MXS allocations always exist and can be computed efficiently for additive valuations. Our results justify that the new fairness concepts are excellent alternatives to EFX and MMS.
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Computational social choice
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Computational social choice
1009
Dynamic Flows on Curved Space Generated by Labeled Data
[+] More 
[-] Less 
The scarcity of labeled data is a long-standing challenge for many machine learning tasks. We propose our gradient flow method to leverage the existing dataset (i.e., source) to generate new samples that are close to the dataset of interest (i.e., target). We lift both datasets to the space of probability distributions on the feature-Gaussian manifold, and then develop a gradient flow method that minimizes the maximum mean discrepancy loss. To perform the gradient flow of distributions on the curved feature-Gaussian space, we unravel the Riemannian structure of the space and compute explicitly the  Riemannian gradient of the loss function induced by the optimal transport metric. For practical applications, we also propose a discretized flow, and provide conditional results guaranteeing the global convergence of the flow to the optimum. We illustrate the results of our proposed gradient flow method on several real-world datasets and show our method can improve the accuracy of classification models in transfer learning settings.
Machine Learning -> ML: Multi-task and transfer learning
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Few-shot learning
List of keywords 
Machine Learning -> ML: Optimization Machine Learning -> ML: Multi-task and transfer learning
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Few-shot learning
1021
Fair Division with Two-Sided Preferences
[+] More 
[-] Less 
We study a fair division setting in which a number of players are to be fairly distributed among a set of teams. In our model, not only do the teams have preferences over the players as in the canonical fair division setting, but the players also have preferences over the teams. We focus on guaranteeing envy-freeness up to one player (EF1) for the teams together with a stability condition for both sides. We show that an allocation satisfying EF1, swap stability, and individual stability always exists and can be computed in polynomial time, even when teams may have positive or negative values for players. Similarly, a balanced and swap stable allocation that satisfies a relaxation of EF1 can be computed efficiently. When teams have nonnegative values for players, we prove that an EF1 and Pareto optimal allocation exists and, if the valuations are binary, can be found in polynomial time. We also examine the compatibility between EF1 and justified envy-freeness.
Game Theory and Economic Paradigms -> GTEP: Computational social choice
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division Game Theory and Economic Paradigms -> GTEP: Computational social choice
1022
Towards Long-delayed Sparsity: Learning a Better Transformer through Reward Redistribution
[+] More 
[-] Less 
Recently, Decision Transformer (DT) pioneered the offline RL into a contextual conditional sequence modeling paradigm, which leverages self-attended autoregression to learn from global target rewards, states, and actions. However, many applications have a severe delay of the above signals, such as the agent can only obtain a reward signal at the end of each trajectory. This delay causes an unwanted bias cumulating in autoregressive learning global signals. In this paper, we focused its virtual example on episodic reinforcement learning with trajectory feedback. We propose a new reward redistribution algorithm for learning parameterized reward functions, and it decomposes the long-delayed reward onto each timestep. To improve the redistributing’s adaptation ability, we formulate the previous decomposition as a bi-level optimization problem for global optimal. We extensively evaluate the proposed method on various benchmarks and demonstrate an overwhelming performance improvement under long-delayed settings.
Planning and Scheduling -> PS: POMDPs
Uncertainty in AI -> UAI: Sequential decision making
List of keywords 
Machine Learning -> ML: Deep reinforcement learning Planning and Scheduling -> PS: POMDPs
Uncertainty in AI -> UAI: Sequential decision making
1028
Communication-Efficient Stochastic Gradient Descent Ascent with Momentum Algorithms
[+] More 
[-] Less 
Numerous machine learning models can be formulated as a stochastic minimax optimization problem, such as imbalanced data classification with AUC maximization. 
Developing efficient algorithms to optimize such kinds of problems is of importance and necessity. However, most existing algorithms restrict their focus on the single-machine setting so that they are incapable of dealing with the large communication overhead in a distributed training system. Moreover, most existing communication-efficient optimization algorithms only focus on the traditional minimization problem, failing to handle the minimax optimization problem. To address these challenging issues, in this paper, we develop two novel communication-efficient stochastic gradient descent ascent with momentum algorithms for the distributed minimax optimization problem, which can significantly reduce the communication cost via the two-way compression scheme. However, the compressed momentum makes it considerably challenging to investigate the convergence rate of our algorithms, especially in the presence of the interaction between the minimization and maximization subproblems. In this paper, we successfully addressed these challenges and established the convergence rate of our algorithms for nonconvex-strongly-concave problems. To the best of our knowledge, our algorithms are the first communication-efficient algorithm with theoretical guarantees for the minimax optimization problem. Finally, we apply our algorithm to the distributed AUC maximization problem for the imbalanced data classification task. Extensive experimental results confirm the efficacy of our algorithm in saving communication costs.
Machine Learning -> ML: Federated learning
Data Mining -> DM: Parallel, distributed and cloud-based high performance mining
List of keywords 
Machine Learning -> ML: Optimization Machine Learning -> ML: Federated learning
Data Mining -> DM: Parallel, distributed and cloud-based high performance mining
1032
Fine-tuned vs. Prompt-tuned Supervised Representations: Which Better Account for Brain Language Representations?
[+] More 
[-] Less 
To decipher the algorithm underlying the human brain’s language representation, previous work probed brain responses to language input with pre-trained artificial neural network (ANN) models fine-tuned on NLU tasks.  However, full fine-tuning generally updates the entire parametric space and distorts pre-trained features, cognitively inconsistent with the brain’s robust multi-task learning ability. Prompt-tuning, in contrast, protects pre-trained weights and learns task-specific embeddings to fit a task. Could prompt-tuning generate representations that better account for the brain’s language representations than fine-tuning? If so, what kind of NLU task leads a pre-trained model to better decode the information represented in the human brain? We investigate these questions by comparing prompt-tuned and fine-tuned representations in neural decoding, that is predicting the linguistic stimulus from the brain activities evoked by the stimulus. We find that on none of the 10 NLU tasks, full fine-tuning significantly outperforms prompt-tuning in neural decoding, implicating that a more brain-consistent tuning method yields representations that better correlate with brain data. Moreover, we identify that tasks dealing with fine-grained concept meaning yield representations that better decode brain activation patterns than other tasks, especially the syntactic chunking task. This indicates that our brain encodes more fine-grained concept information than shallow syntactic information when representing languages.
Humans and AI -> HAI: Brain sciences
Humans and AI -> HAI: Cognitive modeling
List of keywords 
Natural Language Processing -> NLP: Embeddings Humans and AI -> HAI: Brain sciences
Humans and AI -> HAI: Cognitive modeling
1034
Multi-Agent Systems with Quantitative Satisficing Goals
[+] More 
[-] Less 
In the study of reactive systems, qualitative properties are usually easier to model and analyze than quantitative properties. This is especially true in systems where mutually beneficial cooperation between agents is possible, such as multi-agent systems. The large number of possible payoffs available to agents in reactive systems with quantitative properties means that there are many scenarios in which agents deviate from mutually beneficial outcomes in order to gain negligible payoff improvements. This behavior often leads to less desirable outcomes for all agents involved. For this reason we study satisficing goals, derived from a decision-making approach aimed at meeting a good-enough outcome instead of pure optimization. By considering satisficing goals, we are able to employ efficient automata-based algorithms to find pure-strategy Nash equilibria. We then show that these algorithms extend to scenarios in which agents have multiple thresholds, providing an approximation of optimization while still retaining the possibility of mutually beneficial cooperation and efficient automata-based algorithms. Finally, we demonstrate a one-way correspondence between the existence of epsilon-equilibria and the existence of equilibria in games where agents have multiple thresholds.
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
1045
Learning Efficient Truthful Mechanisms for Trading Networks
[+] More 
[-] Less 
Trading networks are an indispensable part of today’s economy, but to compete successfully with others, they must be efficient in maximizing the value they provide to the external market.  While the prior work relies on truthful disclosure of private information to achieve efficiency, we study the problem of designing mechanisms that result in efficient trading networks by incentivizing firms to truthfully reveal their private information to a third party. Additional desirable properties of such mechanisms are weak budget balance (WBB; the third party needs not invest) and individual rationality (IR; firms get non-negative utility).  Unlike combinatorial auctions, there may not exist mechanisms that simultaneously satisfy these properties ex post for trading networks.  We propose an approach for computing or learning truthful and efficient mechanisms for given networks in a Bayesian setting, where WBB and IR, respectively, are relaxed to ex ante and interim for a given distribution over the private information.  We incorporate techniques to reduce computational and sample complexity.  We empirically demonstrate that the proposed approach successfully finds the mechanisms with the relaxed properties for trading networks where achieving ex post properties is impossible.
Multidisciplinary Topics and Applications -> MDA: Economics
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Mechanism design Multidisciplinary Topics and Applications -> MDA: Economics
1063
Black-box Prompt Tuning for Vision-Language Model as a Service
[+] More 
[-] Less 
In the scenario of Model-as-a-Service (MaaS), pre-trained models are usually released as inference APIs. Users are allowed to query those models with manually crafted prompts. Without accessing the network structure and gradient information, it’s tricky to perform continuous prompt tuning on MaaS, especially for vision-language models (VLMs) considering cross-modal interaction. In this paper, we propose a black-box prompt tuning framework for VLMs to learn task-relevant prompts without back-propagation. In particular, the vision and language prompts are jointly optimized in the intrinsic parameter subspace with various evolution strategies. Different prompt variants are also explored to enhance the cross-model interaction. Experimental results show that our proposed black-box prompt tuning framework outperforms both hand-crafted prompt engineering and gradient-based prompt learning methods, which serves as evidence of its capability to train task-relevant prompts in a derivative-free manner.
Machine Learning -> ML: Evolutionary learning
Machine Learning -> ML: Multi-modal learning
List of keywords 
Computer Vision -> CV: Vision and language  Machine Learning -> ML: Evolutionary learning
Machine Learning -> ML: Multi-modal learning
1065
An Exact Algorithm for the Minimum Dominating Set Problem
[+] More 
[-] Less 
The Minimum Dominating Set  (MDS) problem is a classic NP-hard combinatorial optimization problem with many practical applications. Solving MDS is extremely challenging in computation. Previous work on exact algorithms mainly focuses on improving the theoretical time complexity and existing practical algorithms for MDS are almost based on heuristic search. In this paper, we propose a novel lower bound and an exact algorithm for MDS. The algorithm implements a branch-and-bound (BnB) approach and employs the new lower bound to reduce search space. Extensive empirical results show that the new lower bound is efficient in  reduction of the search space and the new algorithm is effective for the standard instances and real-world instances. To the best of our knowledge, this is the first effective BnB algorithm for MDS.
Search -> S: Heuristic search
List of keywords 
Search -> S: Combinatorial search and optimisation Search -> S: Heuristic search
1067
Few-shot Classification via Ensemble Learning with Multi-Order Statistics
[+] More 
[-] Less 
Transfer learning has been widely adopted for few-shot classification. Recent studies reveal that obtaining good generalization representation of images on novel classes is the key to improving the few-shot classification accuracy. To address this need, we prove theoretically that leveraging ensemble learning on the base classes can correspondingly reduce the true error in the novel classes. Following this principle, a novel method named Ensemble Learning with Multi-Order Statistics (ELMOS) is proposed in this paper. In this method, after the backbone network, we use multiple branches to create the individual learners in the ensemble learning, with the goal to reduce the storage cost. We then introduce different order statistics pooling in each branch to increase the diversity of the individual learners. The learners are optimized with supervised losses during the pre-training phase. After pre-training, features from different branches are concatenated for classifier evaluation. Extensive experiments demonstrate that each branch can complement the others and our method can produce a state-of-the-art performance on multiple few-shot classification benchmark datasets.
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
1068
A Refined Upper Bound and Inprocessing for the Maximum K-plex Problem
[+] More 
[-] Less 
A k-plex of a graph G is an induced subgraph in which every vertex has at most k-1 nonadjacent vertices. The Maximum k-plex Problem (MKP) consists in finding a k-plex of the largest size, which is NP-hard and finds many applications. Existing exact algorithms mainly implement a branch-and-bound approach and improve performance  by integrating effective upper bounds and graph reduction rules. In this paper, we propose a refined upper bound, which can derive a tighter upper bound than existing methods,  and an inprocessing strategy, which performs graph reduction incrementally. We implement a new BnB algorithm for MKP that employs the two components to reduce the search space.  Extensive experiments show that both the refined upper bound and the inprocessing strategy are very efficient in the  reduction of search space. The new algorithm outperforms the state-of-the-art algorithms on the tested benchmarks significantly.
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Search -> S: Heuristic search
List of keywords 
Search -> S: Combinatorial search and optimisation Constraint Satisfaction and Optimization -> CSO: Constraint optimization
Search -> S: Heuristic search
1072
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention
[+] More 
[-] Less 
Vision transformer has emerged as a new paradigm in computer vision, showing excellent performance while accompanied by expensive computational cost. Image token pruning is one of the main approaches for ViT compression, due to the facts that the complexity is quadratic with respect to the token number, and many tokens containing only background regions do not truly contribute to the final prediction. Existing works either rely on additional modules to score the importance of individual tokens, or implement a fixed ratio pruning strategy for different input instances. In this work, we propose an adaptive sparse token pruning framework with a minimal cost. Specifically, we firstly propose an inexpensive attention head importance weighted class attention scoring mechanism. Then, learnable parameters are inserted as thresholds to distinguish informative tokens from unimportant ones. By comparing token attention scores and thresholds, we can discard useless tokens hierarchically and thus accelerate inference. The learnable thresholds are optimized in budget-aware training to balance accuracy and complexity, performing the corresponding pruning configurations for different input instances. Extensive experiments demonstrate the effectiveness of our approach. Our method improves the throughput of DeiT-S by 50% and brings only 0.2% drop in top-1 accuracy, which achieves a better trade-off between accuracy and latency than the previous methods.
Machine Learning -> ML: Attention models
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Machine Learning -> ML: Attention models
1078
Hierarchical State Abstraction based on Structural Information Principles
[+] More 
[-] Less 
State abstraction optimizes decision-making by ignoring irrelevant environmental information in reinforcement learning with rich observations. Nevertheless, recent approaches focus on adequate representational capacities resulting in essential information loss, affecting their performances on challenging tasks. In this article, we propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective. Specifically, an unsupervised, adaptive hierarchical state clustering method without requiring manual assistance is presented, and meanwhile, an optimal encoding tree is generated. On each non-root tree node, a new aggregation function and condition structural entropy are designed to achieve hierarchical state abstraction and compensate for sampling-induced essential information loss in state abstraction. Empirical evaluations on a visual gridworld domain and six continuous control benchmarks demonstrate that, compared with five SOTA state abstraction approaches, SISA significantly improves mean episode reward and sample efficiency up to 18.98 and 44.44%, respectively. Besides, we experimentally show that SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.
Agent-based and Multi-agent Systems -> MAS: Applications
Machine Learning -> ML: Deep reinforcement learning
List of keywords 
Machine Learning -> ML: Reinforcement learning Agent-based and Multi-agent Systems -> MAS: Applications
Machine Learning -> ML: Deep reinforcement learning
1099
LGPConv: Learnable Gaussian Perturbation Convolution for Lightweight Pansharpening
[+] More 
[-] Less 
Pansharpening is a crucial and challenging task that aims to obtain a high spatial resolution image by merging a multispectral (MS) image and a panchromatic (PAN) image. Current methods use CNNs with standard convolution, but we’ve observed strong correlation among channel dimensions in the kernel, leading to computational burden and redundancy. To address this, we propose Learnable Gaussian Perturbation Convolution (LGPConv), surpassing standard convolution. LGPConv leverages two properties of standard convolution kernels: 1) correlations within channels, learning a premier kernel as a base to reduce parameters and training difficulties caused by redundancy; 2) introducing Gaussian noise perturbations to simulate randomness and enhance nonlinear representation within channels. We incorporate LGPConv into a well-designed pansharpening network and demonstrate its superiority through extensive experiments, achieving state-of-the-art performance with minimal parameters (27K). Code is available on the GitHub page of the authors.
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Applications
List of keywords 
Machine Learning -> ML: Convolutional networks Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Applications
1100
Improving Heterogeneous Model Reuse by Density Estimation
[+] More 
[-] Less 
This paper studies multiparty learning, aiming to learn a model using the private data of different participants. Model reuse is a promising solution for multiparty learning, assuming that a local model has been trained for each party. Considering the potential sample selection bias among different parties, some heterogeneous model reuse approaches have been developed. However, although pre-trained local classifiers are utilized in these approaches, the characteristics of the local data are not well exploited. This motivates us to estimate the density of local data and design an auxiliary model together with the local classifiers for reuse. To address the scenarios where some local models are not well pre-trained, we further design a multiparty cross-entropy loss for calibration. Upon existing works, we address a challenging problem of heterogeneous model reuse from a decision theory perspective and take advantage of recent advances in density estimation. Experimental results on both synthetic and benchmark data demonstrate the superiority of the proposed method.
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Classification
List of keywords 
Machine Learning -> ML: Multi-task and transfer learning Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Classification
1111
Violin: Virtual Overbridge Linking for Enhancing Semi-supervised Learning on Graphs with Limited Labels
[+] More 
[-] Less 
Graph Neural Networks (GNNs) is a family of promising tools for graph semi-supervised learning. However, in training, most existing GNNs rely heavily on a large amount of labeled data, which is rare in real-world scenarios. Unlabeled data with useful information are usually under-exploited, which limits the representation power of GNNs. To handle these problems, we propose Virtual Overbridge Linking (Violin), a generic framework to enhance the learning capacity of common GNNs. By learning to add virtual overbridges between two nodes that are estimated to be semantic-consistent, labeled and unlabeled data can be correlated. Supervised information can be well utilized in training while simultaneously inducing the model to learn from unlabeled data. Discriminative relation patterns extracted from unlabeled nodes can also be shared with other nodes even if they are remote from each other. Motivated by recent advances in data augmentations, we additionally integrate Violin with the consistency regularized training. Such a scheme yields node representations with better robustness, which significantly enhances a GNN. Violin can be readily extended to a wide range of GNNs without introducing additional learnable parameters. Extensive experiments on six datasets demonstrate that our method is effective and robust under low-label rate scenarios, where Violin can boost some GNNs’ performance by over 10% on node classifications.
Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Representation learning
List of keywords 
Machine Learning -> ML: Sequence and graph learning Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Representation learning
1117
Co-training with High-Confidence Pseudo Labels for Semi-supervised Medical Image Segmentation
[+] More 
[-] Less 
Consistency regularization and pseudo labeling-based semi-supervised methods perform co-training using the pseudo labels from multi-view inputs. However, such co-training models tend to converge early to a consensus, degenerating to the self-training ones, and produce low-confidence pseudo labels from the perturbed inputs during training. 
    To address these issues, we propose an Uncertainty-guided Collaborative Mean-Teacher (UCMT) for semi-supervised semantic segmentation with the high-confidence pseudo labels. Concretely, UCMT consists of two main components: 1) collaborative mean-teacher (CMT) for encouraging model disagreement and performing co-training between the sub-networks, and 2) uncertainty-guided region mix (UMIX) for manipulating the input images according to the uncertainty maps of CMT and facilitating CMT to produce high-confidence pseudo labels. 
    Combining the strengths of UMIX with CMT, UCMT can retain model disagreement and enhance the quality of pseudo labels for the co-training segmentation.
    Extensive experiments on four public medical image datasets including 2D and 3D modalities demonstrate the superiority of UCMT over the state-of-the-art. 
    Code is available at: https://github.com/Senyh/UCMT.
Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Segmentation
List of keywords 
Machine Learning -> ML: Semi-supervised learning Computer Vision -> CV: Biomedical image analysis
Computer Vision -> CV: Segmentation
1132
Imbalanced Node Classification Beyond Homophilic Assumption
[+] More 
[-] Less 
Imbalanced node classification widely exists in real-world networks where graph neural networks (GNNs) are usually highly inclined to majority classes and suffer from severe performance degradation on classifying minority class nodes. Various imbalanced node classification methods have been proposed recently which construct synthetic nodes and edges w.r.t. minority classes to balance the label/topology distribution. However, they are all based on homophilic assumption that nodes of the same label tend to connect despite the widely existence of heterophilic edges in real-world graphs. Thus, they uniformly aggregate features from both homophilic and heterophilic neighbors and rely on feature similarity to generate synthetic edges, which cannot be applied to imbalanced graphs in high heterophily. To address this problem, we propose a novel GraphSANN for imbalanced node classification on both homophilic and heterophilic graphs. Firstly, we propose a unified feature mixer to generate synthetic nodes with both homophilic and heterophilic interpolation in a unified way. Next, by randomly sampling edges between synthetic nodes and existing nodes as candidata edges, we design an adaptive subgraph extractor to dynamically extract the contextual subgraphs of candidate edges with flexible ranges. Finally, we develop a multi-filter subgraph encoder which constructs multiple different filter channels to discriminatively aggregate neighbors’ information along the homophilic and heterophilic edges. Extensive experiments on eight benchmark datasets demonstrate the superiority of our model for imbalanced node classificaiton on both homophilic and heterophilic graphs.
Data Mining -> DM: Class imbalance and unequal cost
Machine Learning -> ML: Learning graphical models
List of keywords 
Data Mining -> DM: Mining graphs Data Mining -> DM: Class imbalance and unequal cost
Machine Learning -> ML: Learning graphical models
1134
Minimally Supervised Contextual Inference from Human Mobility: An Iterative Collaborative Distillation Framework
[+] More 
[-] Less 
The context about trips and users from mobility data is valuable for mobile service providers to understand their customers and improve their services. Existing inference methods require a large number of labels for training, which is hard to meet in practice. In this paper, we study a more practical yet challenging setting—contextual inference using mobility data with minimal supervision (i.e., a few labels per class and massive unlabeled data). A typical solution is to apply semi-supervised methods that follow a self-training framework to bootstrap a model based on all features. However, using a limited labeled set brings high risk of overfitting to self-training, leading to unsatisfactory performance. We propose a novel collaborative distillation framework STCOLAB. It sequentially trains spatial and temporal modules at each iteration following the supervision of ground-truth labels. In addition, it distills knowledge to the module being trained using the logits produced by the latest trained module of the other modality, thereby mutually calibrating the two modules and combining the knowledge from both modalities. Extensive experiments on two real-world datasets show STCOLAB achieves significantly more accurate contextual inference than various baselines.
Machine Learning -> ML: Semi-supervised learning
List of keywords 
Data Mining -> DM: Mining spatial and/or temporal data Machine Learning -> ML: Semi-supervised learning
1137
Robust Image Ordinal Regression with Controllable Image Generation
[+] More 
[-] Less 
Image ordinal regression has been mainly studied along the line of exploiting the order of categories. However, the issues of class imbalance and category overlap that are very common in ordinal regression were largely overlooked. As a result, the performance on minority categories is often unsatisfactory. In this paper, we propose a novel framework called CIG based on controllable image generation to directly tackle these two issues. Our main idea is to generate extra training samples with specific labels near category boundaries, and the sample generation is biased toward the less-represented categories. To achieve controllable image generation, we seek to separate structural and categorical information of images based on structural similarity, categorical similarity, and reconstruction constraints. We evaluate the effectiveness of our new CIG approach in three different image ordinal regression scenarios. The results demonstrate that CIG can be flexibly integrated with off-the-shelf image encoders or ordinal regression models to achieve improvement, and further, the improvement is more significant for minority categories.
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) 1145
COOL, a Context Outlooker, and Its Application to Question Answering and Other Natural Language Processing Tasks
[+] More 
[-] Less 
Vision outlooker improves the performance of vision transformers, which implements a self-attention mechanism by adding an outlook attention, a form of local attention.  
In natural language processing, as has been the case in computer vision and other domains, transformer-based models constitute the state-of-the-art for most processing tasks. In this domain, too, many authors have argued and demonstrated the importance of local context.
We present an outlook attention mechanism, COOL, for natural language processing. COOL, added on top of the self-attention layers of a transformer-based model, encodes local syntactic context considering word proximity and more pair-wise constraints than dynamic convolution used by existing approaches.
A comparative empirical performance evaluation of an implementation of COOL with different transformer-based models confirms the opportunity for improvement over a baseline using the original models alone for various natural language processing tasks, including question answering. The proposed approach achieves competitive performance with existing state-of-the-art methods on some tasks.
Natural Language Processing -> NLP: Language models
List of keywords 
Natural Language Processing -> NLP: Question answering Natural Language Processing -> NLP: Language models
1152
WBFlow: Few-shot White Balance for sRGB Images via Reversible Neural Flows
[+] More 
[-] Less 
The sRGB white balance methods aim to correct  the nonlinear color cast of sRGB images without  accessing raw values.  Although existing methods  have achieved increasingly better results, their generalization  to sRGB images from multiple cameras  is still under explored.  In this paper, we propose  the network named WBFlow that not only performs  superior white balance for sRGB images but also  generalizes well to multiple cameras.  Specifically,  we take advantage of neural flow to ensure the reversibility  of WBFlow, which enables lossless rendering  of color cast sRGB images back to pseudo  raw features for linear white balancing and thus  achieves superior performance.  Furthermore, inspired  by camera transformation approaches, we  have designed a camera transformation (CT) in  pseudo raw feature space to generalize WBFlow  for different cameras via few shot learning.  By  utilizing a few sRGB images from an untrained  camera, our WBFlow can perform well on this  camera by learning the camera specific parameters  of CT.  Extensive experiments show that WBFlow  achieves superior camera generalization and accuracy  on three public datasets as well as our rendered  multiple camera sRGB dataset.  Our code is available  at https://github.com/ChunxiaoLe/WBFlow.
Computer Vision -> CV: Applications
List of keywords 
Computer Vision -> CV: Computational photography Computer Vision -> CV: Applications
1155
Deep Multi-view Subspace Clustering with Anchor Graph
[+] More 
[-] Less 
Deep multi-view subspace clustering (DMVSC) has recently attracted increasing attention due to its promising performance. However, existing DMVSC methods still have two issues: (1) they mainly focus on using autoencoders to nonlinearly embed the data, while the embedding may be suboptimal for clustering because the clustering objective is rarely considered in autoencoders, and (2) existing methods typically have a quadratic or even cubic complexity, which makes it challenging to deal with large-scale data. To address these issues, in this paper we propose a novel deep multi-view subspace clustering method with anchor graph (DMCAG). To be specific, DMCAG firstly learns the embedded features for each view independently, which are used to obtain the subspace representations. To significantly reduce the complexity, we construct an anchor graph with small size for each view. Then, spectral clustering is performed on an integrated anchor graph to obtain pseudo-labels. To overcome the negative impact caused by suboptimal embedded features, we use pseudo-labels to refine the embedding process to make it more suitable for the clustering task. Pseudo-labels and embedded features are updated alternately. Furthermore, we design a strategy to keep the consistency of the labels based on contrastive learning to enhance the clustering performance. Empirical studies on real-world datasets show that our method achieves superior clustering performance over other state-of-the-art methods.
Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Self-supervised Learning
List of keywords 
Machine Learning -> ML: Clustering Machine Learning -> ML: Multi-view learning
Machine Learning -> ML: Self-supervised Learning
1180
Faster Exact MPE and Constrained Optimization with Deterministic Finite State Automata
[+] More 
[-] Less 
We propose a concise function representation based on deterministic finite state automata for exact most probable explanation and constrained optimization tasks in graphical models. We then exploit our concise representation within Bucket Elimination (BE). We denote our version of BE as FABE. FABE significantly improves the performance of BE in terms of runtime and memory requirements by minimizing redundancy. Indeed, results on most probable explanation and weighted constraint satisfaction benchmarks show that FABE often outperforms the state of the art, leading to significant runtime improvements (up to 2 orders of magnitude in our tests).
Uncertainty in AI -> UAI: Graphical models
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint optimization Uncertainty in AI -> UAI: Graphical models
1194
Action Recognition with Multi-stream Motion Modeling and Mutual Information Maximization
[+] More 
[-] Less 
Action recognition has long been a fundamental and intriguing problem in artificial intelligence. The task is challenging due to the high dimensionality nature of an action, as well as the subtle motion details to be considered. Current state-of-the-art approaches typically learn from articulated motion sequences in the straightforward 3D Euclidean space. However, the vanilla Euclidean space is not efficient for modeling important motion characteristics such as the joint-wise angular acceleration, which reveals the driving force behind the motion. Moreover, current methods typically attend to each channel equally and lack theoretical constrains on extracting task-relevant features from the input. 
In this paper, we seek to tackle these challenges from three aspects: (1) We propose to incorporate an acceleration representation, explicitly modeling the higher-order variations in motion. (2) We introduce a novel Stream-GCN network equipped with multi-stream components and channel attention, where different representations (i.e., streams) supplement each other towards a more precise action recognition while attention capitalizes on those important channels. (3) We explore feature-level supervision for maximizing the extraction of task-relevant information and formulate this into a mutual information loss. Empirically, our approach sets the new state-of-the-art performance on three benchmark datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA.
Computer Vision -> CV: Action and behavior recognition
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Action and behavior recognition
1201
Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential Decision-Making in Multi-Agent Reinforcement Learning
[+] More 
[-] Less 
In multi-agent reinforcement learning (MARL), self-interested agents attempt to establish equilibrium and achieve coordination depending on game structure. However, existing MARL approaches are mostly bound by the simultaneous actions of all agents in the Markov game (MG) framework, and few works consider the formation of equilibrium strategies via asynchronous action coordination. In view of the advantages of Stackelberg equilibrium (SE) over Nash equilibrium, we construct a spatio-temporal sequential decision-making structure derived from the MG and propose an N-level policy model based on a conditional hypernetwork shared by all agents. This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents. Agents can learn heterogeneous SE policies while still maintaining parameter sharing, which leads to reduced cost for learning and storage and enhanced scalability as the number of agents increases. Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios, and performs admirably in immensely complex settings including cooperative tasks and mixed tasks.
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Machine Learning -> ML: Reinforcement learning
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Machine Learning -> ML: Reinforcement learning
1213
Enhancing Datalog Reasoning with Hypertree Decompositions
[+] More 
[-] Less 
Datalog reasoning based on the seminaive evaluation strategy evaluates rules using traditional join plans, which often leads to redundancy and inefficiency in practice, especially when the rules are complex. Hypertree decompositions help identify efficient query plans and reduce similar redundancy in query answering. However, it is unclear how this can be applied to materialisation and incremental reasoning with recursive Datalog programs. Moreover, hypertree decompositions require additional data structures and thus introduce nonnegligible overhead in both runtime and memory consumption. In this paper, we provide algorithms that exploit hypertree decompositions for the materialisation and incremental evaluation of Datalog programs. Furthermore, we combine this approach with standard Datalog reasoning algorithms in a modular fashion so that the overhead caused by the decompositions is reduced. Our empirical evaluation shows that, when the program contains complex rules, the combined approach is usually significantly faster than the baseline approach, sometimes by orders of magnitude.
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Semantic Web
List of keywords 
Knowledge Representation and Reasoning -> KRR: Logic programming Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Semantic Web
1219
Federated Graph Semantic and Structural Learning
[+] More 
[-] Less 
Federated graph learning collaboratively learns a global graph neural network with distributed graphs, where the non-independent and identically distributed property is one of the major challenge. Most relative arts focus on traditional distributed tasks like images and voices, incapable of the graph structures. This paper firstly reveals that local client distortion is brought by both node-level semantics and graph-level structure. First, for node-level semantic, we find that contrasting nodes from distinct classes is beneficial to provide a well-performing discrimination. We pull the local node towards the global node of the same class and push them away from the global node of different classes. Second, we postulate that a well-structural graph neural network possesses similarity for neighbors due to the inherent adjacency relationships. However, aligning each node with adjacent nodes hinders discrimination due to the potential class inconsistency. We transform the adjacency relationships into the similarity distribution and leverage the global model to distill the relation knowledge into the local model, which preserves the structural information and discriminability of the local model. Empirical results on three graph datasets manifest the superiority of the proposed method over counterparts.
Machine Learning -> ML: Sequence and graph learning
List of keywords 
Machine Learning -> ML: Federated learning Machine Learning -> ML: Sequence and graph learning
1220
GPLight: Grouped Multi-agent Reinforcement Learning for Large-scale Traffic Signal Control
[+] More 
[-] Less 
The use of multi-agent reinforcement learning (MARL) methods in coordinating traffic lights (CTL) has become increasingly popular, treating each intersection as an agent. However, existing MARL approaches either treat each agent absolutely homogeneous, i.e., same network and parameter for each agent, or treat each agent completely heterogeneous, i.e., different networks and parameters for each agent. This creates a difficult balance between accuracy and complexity, especially in large-scale CTL. To address this challenge, we propose a grouped MARL method named GPLight. We first mine the similarity between agent environment considering both real-time traffic flow and static fine-grained road topology. Then we propose two loss functions to maintain a learnable and dynamic clustering, one that uses mutual information estimation for better stability, and the other that maximizes separability between groups. Finally, GPLight enforces the agents in a group to share the same network and parameters. This approach reduces complexity by promoting cooperation within the same group of agents while reflecting differences between groups to ensure accuracy. To verify the effectiveness of our method, we conduct experiments on both synthetic and real-world datasets, with up to 1,089 intersections. Compared with state-of-the-art methods, experiment results demonstrate the superiority of our proposed method, especially in large-scale CTL.
Machine Learning -> ML: Deep reinforcement learning
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Applications Machine Learning -> ML: Deep reinforcement learning
1223
Augmenting Automated Spectrum Based Fault Localization for Multiple Faults
[+] More 
[-] Less 
Spectrum-based Fault Localization (SBFL) uses the coverage of test cases and their outcome (pass/fail) to predict the "suspiciousness” of program components, e.g., lines of code. SBFL is, perhaps, the most successful fault localization technique due to its simplicity and scalability. However, SBFL heuristics do not perform well in scenarios where a program may have multiple faulty components. In this work, we propose a new algorithm that "augments” previously proposed SBFL heuristics to produce a ranked list where faulty components ranked low by base SBFL metrics are ranked significantly higher. We implement our ideas in a tool, ARTEMIS, that attempts to "bubble up” faulty components which are ranked lower by base SBFL metrics. We compare our technique to the most popular SBFL metrics and demonstrate statistically significant improvement in the developer effort for fault localization with respect to the basic strategies.
Multidisciplinary Topics and Applications -> MDA: Software engineering
List of keywords 
Knowledge Representation and Reasoning -> KRR: Diagnosis and abductive reasoning Multidisciplinary Topics and Applications -> MDA: Software engineering
1224
From Generation to Suppression: Towards Effective Irregular Glow Removal for Nighttime Visibility Enhancement
[+] More 
[-] Less 
Most existing Low-Light Image Enhancement (LLIE) methods are primarily designed to improve brightness in dark regions, which suffer from severe degradation in nighttime images. However, these methods have limited exploration in another major visibility damage, the glow effects in real night scenes. Glow effects are inevitable in the presence of artificial light sources and cause further diffused blurring when directly enhanced. To settle this issue, we innovatively consider the glow suppression task as learning physical glow generation via multiple scattering estimation according to the Atmospheric Point Spread Function (APSF). In response to the challenges posed by uneven glow intensity and varying source shapes, an APSF-based Nighttime Imaging Model with Near-field Light Sources (NIM-NLS) is specifically derived to design a scalable Light-aware Blind Deconvolution Network (LBDN). The glow-suppressed result is then brightened via a Retinex-based Enhancement Module (REM). Remarkably, the proposed glow suppression method is based on zero-shot learning and does not rely on any paired or unpaired training data. Empirical evaluations demonstrate the effectiveness of the proposed method in both glow suppression and low-light enhancement tasks.
Computer Vision -> CV: Computational photography
Computer Vision -> CV: Segmentation
List of keywords 
Computer Vision -> CV: Machine learning for vision Computer Vision -> CV: Computational photography
Computer Vision -> CV: Segmentation
1231
Rainbow Cycle Number and EFX Allocations: (Almost) Closing the Gap
[+] More 
[-] Less 
Recently, some studies on the fair allocation of indivisible goods notice a connection between a purely combinatorial problem called the Rainbow Cycle problem and a fairness notion known as EFX: assuming that the rainbow cycle number for parameter d (i.e. R(d)) is O(d^β .log(d)^γ), we can find a (1 − ϵ)-EFX allocation with O_ϵ(n^(β/β+1) .log(n)^(γ/β+1)) number of discarded goods. The best upper bound on R(d) is improved in a series of works to O(d^4), O(d^(2+o(1))), and finally to O(d^2). Also, via a simple observation, we have R(d) ∈ Ω(d).
In this paper, we introduce another problem in extremal combinatorics. For a parameter l, we define the rainbow path degree and denote it by H(l). We show that any lower bound on H(l) yields an upper bound on R(d). Next, we prove that H(l) ∈ Ω(l^2 / log(l)) which yields an almost tight upper bound of R(d) ∈ Ω(d.log(d)).  This, in turn, proves the existence of (1−ϵ)-EFX allocation with O_ϵ(√n .log(n)) number of discarded goods. In addition, for the special case of the Rainbow Cycle problem that the edges in each part form a permutation, we improve the upper bound to R(d) ≤ 2d−4. We leverage H(l) to achieve this bound.
Our conjecture is that the exact value of H(l) is ⌊l^2/2⌋ − 1. We provide some experiments that support this conjecture. Assuming this conjecture is correct, we have R(d) ∈ θ(d).
Multidisciplinary Topics and Applications -> MDA: Economics
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division Multidisciplinary Topics and Applications -> MDA: Economics
1235
Independent Feature Decomposition and Instance Alignment for Unsupervised Domain Adaptation
[+] More 
[-] Less 
Existing Unsupervised Domain Adaptation (UDA) methods typically attempt to perform knowledge transfer in a domain-invariant space explicitly or implicitly. In practice, however, the obtained features is often mixed with domain-specific information which causes performance degradation. To overcome this fundamental limitation, this article presents a novel independent feature decomposition and instance alignment method (IndUDA in short). Specifically, based on an invertible flow, we project the base features into a decomposed latent space with domain-invariant and domain-specific dimensions. To drive semantic decomposition independently, we then swap the domain-invariant part across source and target domain samples with the same category and require their inverted features are consistent in class-level with the original features.  By treating domain-specific information as noise, we replace it by Gaussian noise and further regularize source model training by instance alignment, i.e., requiring the base features close to the corresponding reconstructed features, respectively. Extensive experiment results demonstrate that our method achieves state-of-the-art performance on popular UDA benchmarks. The appendix and code are available at https://github.com/ayombeach/IndUDA.
Computer Vision -> CV: Representation learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Representation learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
1236
Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees
[+] More 
[-] Less 
Linear Temporal Logic (LTL) is widely used to specify high-level objectives for system policies, and it is highly desirable for autonomous systems to learn the optimal policy with respect to such specifications. However, learning the optimal policy from LTL specifications is not trivial. We present a model-free Reinforcement Learning (RL) approach that efficiently learns an optimal policy for an unknown stochastic system, modelled using Markov Decision Processes (MDPs). We propose a novel and more general product MDP, reward structure and discounting mechanism that, when applied in conjunction with off-the-shelf model-free RL algorithms, efficiently learn the optimal policy that maximizes the probability of satisfying a given LTL specification with optimality guarantees. We also provide improved theoretical results on choosing the key parameters in RL to ensure optimality. To directly evaluate the learned policy, we adopt probabilistic model checker PRISM to compute the probability of the policy satisfying such specifications. Several experiments on various tabular MDP environments across different LTL tasks demonstrate the improved sample efficiency and optimal policy convergence.
Planning and Scheduling -> PS: Markov decisions processes
Robotics -> ROB: Learning in robotics
List of keywords 
Machine Learning -> ML: Reinforcement learning Planning and Scheduling -> PS: Markov decisions processes
Robotics -> ROB: Learning in robotics
1238
c-TPE: Tree-structured Parzen Estimator with Inequality Constraints for Expensive Hyperparameter Optimization
[+] More 
[-] Less 
Hyperparameter optimization (HPO) is crucial for strong performance of deep learning algorithms and real-world applications often impose some constraints, such as memory usage, or latency on top of the performance requirement. In this work, we propose constrained TPE (c-TPE), an extension of the widely-used versatile Bayesian optimization method, tree-structured Parzen estimator (TPE), to handle these constraints. Our proposed extension goes beyond a simple combination of an existing acquisition function and the original TPE, and instead includes modifications that address issues that cause poor performance. We thoroughly analyze these modifications both empirically and theoretically, providing insights into how they effectively overcome these challenges. In the experiments, we demonstrate that c-TPE exhibits the best average rank performance among existing methods with statistical significance on 81 expensive HPO with inequality constraints. Due to the lack of baselines, we only discuss the applicability of our method to hard-constrained optimization in Appendix D. See https://arxiv.org/abs/2211.14411 for the latest version with Appendix.
Machine Learning -> ML: Automated machine learning
List of keywords 
Machine Learning -> ML: Hyperparameter optimization Machine Learning -> ML: Automated machine learning
1239
Speeding Up Multi-Objective Hyperparameter Optimization by Task Similarity-Based Meta-Learning for the Tree-Structured Parzen Estimator
[+] More 
[-] Less 
Hyperparameter optimization (HPO) is a vital step in improving performance in deep learning (DL). Practitioners are often faced with the trade-off between multiple criteria, such as accuracy and latency. Given the high computational needs of DL and the growing demand for efficient HPO, the acceleration of multi-objective (MO) optimization becomes ever more important. Despite the significant body of work on meta-learning for HPO, existing methods are inapplicable to MO tree-structured Parzen estimator (MO-TPE), a simple yet powerful MO-HPO algorithm. In this paper, we extend TPE’s acquisition function to the meta-learning setting using a task similarity defined by the overlap of top domains between tasks. We also theoretically analyze and address the limitations of our task similarity. In the experiments, we demonstrate that our method speeds up MO-TPE on tabular HPO benchmarks and attains state-of-the-art performance. Our method was also validated externally by winning the AutoML 2022 competition on “Multiobjective Hyperparameter Optimization for Transformers”. See https://arxiv.org/abs/2212.06751 for the latest version with Appendix.
Machine Learning -> ML: Automated machine learning
Machine Learning -> ML: Meta-learning
List of keywords 
Machine Learning -> ML: Hyperparameter optimization Machine Learning -> ML: Automated machine learning
Machine Learning -> ML: Meta-learning
1241
PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces
[+] More 
[-] Less 
The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this issue, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form calculation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient. See https://arxiv.org/abs/2304.10255 for the latest version with Appendix.
Machine Learning -> ML: Automated machine learning
List of keywords 
Machine Learning -> ML: Hyperparameter optimization Machine Learning -> ML: Automated machine learning
1242
Null-Space Diffusion Sampling for Zero-Shot Point Cloud Completion
[+] More 
[-] Less 
Point cloud completion aims at estimating the complete data of objects from degraded observations. Despite existing completion methods achieving impressive performances, they rely heavily on degraded-complete data pairs for supervision. In this work, we propose a novel framework named Null-Space Diffusion Sampling (NSDS) to solve the point cloud completion task in a zero-shot manner.  By leveraging a pre-trained point cloud diffusion model as the off-the-shelf generator, our sampling approach can generate desired completion outputs with the guidance of the observed degraded data without any extra training. Furthermore, we propose a tolerant loop mechanism to improve the quality of completion results for hard cases. Experimental results demonstrate our zero-shot framework achieves superior completion performance than unsupervised methods and comparable performance to supervised methods in various degraded situations.
List of keywords 
Computer Vision -> CV: 3D computer vision 1250
Voice Guard: Protecting Voice Privacy with Strong and Imperceptible Adversarial Perturbation in the Time Domain
[+] More 
[-] Less 
Adversarial example is a rising tool for voice privacy protection. By adding imperceptible noise to public audio, it prevents tampers from using zero-shot Voice Conversion (VC) to synthesize high quality speech with target speaker identity. However, many existing studies ignore the human perception characteristics of audio data, and it is challenging to generate strong and imperceptible adversarial audio. In this paper, we propose the Voice Guard defense method, which uses a novel method to advance the adversarial perturbation to the time domain to avoid the loss caused by cross-domain conversion. And the psychoacoustic model is introduced into the defense of VC for the first time, which greatly improves the disruption ability and concealment of adversarial audio. We also standardize the evaluation metrics of adversarial audio for the first time, combining multi-dimensional metrics to define the criteria for defense. We evaluate Voice Guard on several state-of-the-art zero-shot VC models. The experimental results show that our method can ensure the perceptual quality of adversarial audio while having a strong defense capability, and is far superior to previous works in terms of disruption ability and concealment.
Natural Language Processing -> NLP: Speech
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Security and privacy Natural Language Processing -> NLP: Speech
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
1251
Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding
[+] More 
[-] Less 
Panoptic narrative grounding (PNG) aims to segment things and stuff objects in an image described by noun phrases of a narrative caption. As a multimodal task, an essential aspect of PNG is the visual-linguistic interaction between image and caption. The previous two-stage method aggregates visual contexts from offline-generated mask proposals to phrase features, which tend to be noisy and fragmentary. The recent one-stage method aggregates only pixel contexts from image features to phrase features, which may incur semantic misalignment due to lacking object priors. To realize more comprehensive visual-linguistic interaction, we propose to enrich phrases with coupled pixel and object contexts by designing a Phrase-Pixel-Object Transformer Decoder (PPO-TD), where both fine-grained part details and coarse-grained entity clues are aggregated to phrase features. In addition, we also propose a Phrase-Object Contrastive Loss (POCL) to pull closer the matched phrase-object pairs and push away unmatched ones for aggregating more precise object contexts from more phrase-relevant object tokens. Extensive experiments on the PNG benchmark show our method achieves new state-of-the-art performance with large margins.
Computer Vision -> CV: Segmentation
List of keywords 
Computer Vision -> CV: Vision and language  Computer Vision -> CV: Segmentation
1252
Latent Processes Identification From Multi-View Time Series
[+] More 
[-] Less 
Understanding the dynamics of time series data typically requires identifying the unique latent factors for data generation, a.k.a., latent processes identification. Driven by the independent assumption, existing works have made great progress in handling single-view data. However, it is a non-trivial problem that extends them to multi-view time series data because of two main challenges: (i) the complex data structure, such as temporal dependency, can result in violation of the independent assumption; (ii) the factors from different views are generally overlapped and are hard to be aggregated to a complete set. In this work, we propose a novel framework MuLTI that employs the contrastive learning technique to invert the data generative process for enhanced identifiability. Additionally, MuLTI integrates a permutation mechanism that merges corresponding overlapped variables by the establishment of an optimal transport formula. Extensive experimental results on synthetic and real-world datasets demonstrate the superiority of our method in recovering identifiable latent variables on multi-view time series. The code is available on https://github.com/lccurious/MuLTI.
Machine Learning -> ML: Multi-view learning
List of keywords 
Machine Learning -> ML: Causality Machine Learning -> ML: Multi-view learning
1265
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering
[+] More 
[-] Less 
We study visual question answering in a setting where the answer has to be mined from a pool of relevant and irrelevant images given as a context. For such a setting, a model must first retrieve relevant images from the pool and answer the question from these retrieved images. We refer to this problem as retrieval-based visual question answering (or RETVQA in short). The RETVQA is distinctively different and more challenging than the traditionally-studied Visual Question Answering (VQA), where a given question has to be answered with a single relevant image in context. Towards solving the RETVQA task, we propose a unified Multi Image BART (MI-BART) that takes a question and retrieved images using our relevance encoder for free-form fluent answer generation. Further, we introduce the largest dataset in this space, namely RETVQA, which has the following salient features: multi-image and retrieval requirement for VQA, metadata-independent questions over a pool of heterogeneous images, expecting a mix of classification-oriented and open-ended generative answers. Our proposed framework achieves an accuracy of 76.5% and a fluency of 79.3% on the proposed dataset, namely RETVQA and also outperforms state-of-the-art methods by 4.9% and 11.8% on the image segment of the publicly available WebQA dataset on the accuracy and fluency metrics, respectively.
Computer Vision -> CV: Applications
Machine Learning -> ML: Multi-modal learning
List of keywords 
Computer Vision -> CV: Vision and language  Computer Vision -> CV: Applications
Machine Learning -> ML: Multi-modal learning
1268
Towards Semantics- and Domain-Aware Adversarial Attacks
[+] More 
[-] Less 
Language models are known to be vulnerable to textual adversarial attacks, which add human-imperceptible perturbations to the input to mislead DNNs. It is thus imperative to devise effective attack algorithms to identify the deficiencies of DNNs before real-world deployment. However, existing word-level attacks have two major deficiencies: (1) They may change the semantics of the original sentence. (2) The generated adversarial sample can appear unnatural to humans due to the introduction of out-of-domain substitute words. In this paper, to address such drawbacks, we propose a semantics- and domain-aware word-level attack method. Specifically, we greedily replace the important words in a sentence with the ones suggested by a language model. The language model is trained to be semantics- and domain-aware via contrastive learning and in-domain pre-training. Furthermore, to balance the quality of adversarial examples and the attack success rate, we propose an iterative updating framework to optimize the contrastive learning loss and the in-domain pre-training loss in circular order. Comprehensive experimental comparisons confirm the superiority of our approach. Notably, compared with state-of-the-art benchmarks, our strategy can achieve over 3\% improvement in attack success rates and 9.8\% improvement in the quality of adversarial examples.
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Safety and robustness Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
1272
Scaling Goal-based Exploration via Pruning Proto-goals
[+] More 
[-] Less 
One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short. Goal-directed, purposeful behaviours are able to overcome this, but rely on a good goal space. The core challenge in goal discovery is finding the right balance between generality (not hand-crafted) and tractability (useful, not too many). Our approach explicitly seeks the middle ground, enabling the human designer to specify a vast but meaningful proto-goal space, and an autonomous discovery process to refine this to a narrower space of controllable, reachable, novel, and relevant goals. The effectiveness of goal-conditioned exploration with the latter is then demonstrated in three challenging environments.
Machine Learning -> ML: Deep reinforcement learning
List of keywords 
Machine Learning -> ML: Reinforcement learning Machine Learning -> ML: Deep reinforcement learning
1274
SMARTformer: Semi-Autoregressive Transformer with Efficient Integrated Window Attention for Long Time Series Forecasting
[+] More 
[-] Less 
The success of Transformers in long time series forecasting (LTSF) can be attributed to their attention mechanisms and non-autoregressive (NAR) decoder structures, which capture long-range de- pendencies. However, time series data also contain abundant local temporal dependencies, which are often overlooked in the literature and significantly hinder forecasting performance. To address this issue, we introduce SMARTformer, which stands for SeMi-AutoRegressive Transformer. SMARTformer utilizes the Integrated Window Attention (IWA) and Semi-AutoRegressive (SAR) Decoder to capture global and local dependencies from both encoder and decoder perspectives. IWA conducts local self-attention in multi-scale windows and global attention across windows with linear com- plexity to achieve complementary clues in local and enlarged receptive fields. SAR generates subsequences iteratively, similar to autoregressive (AR) decoding, but refines the entire sequence in a NAR manner. This way, SAR benefits from both the global horizon of NAR and the local detail capturing of AR. We also introduce the Time-Independent Embedding (TIE), which better captures local dependencies by avoiding entanglements of various periods that can occur when directly adding po- sitional embedding to value embedding. Our ex- tensive experiments on five benchmark datasets demonstrate the effectiveness of SMARTformer against state-of-the-art models, achieving an improvement of 10.2% and 18.4% in multivariate and univariate long-term forecasting, respectively.
Machine Learning -> ML: Regression
Machine Learning -> ML: Time series and data streams
Data Mining -> DM: Applications
List of keywords 
Data Mining -> DM: Mining spatial and/or temporal data Machine Learning -> ML: Regression
Machine Learning -> ML: Time series and data streams
Data Mining -> DM: Applications
1310
Fairness via Group Contribution Matching
[+] More 
[-] Less 
Fairness issues in Deep Learning models have recently received increasing attention due to their significant societal impact. Although methods for mitigating unfairness are constantly proposed, little research has been conducted to understand how discrimination and bias develop during the standard training process. In this study, we propose analyzing the contribution of each subgroup (i.e., a group of data with the same sensitive attribute) in the training process to understand the cause of such bias development process. We propose a gradient-based metric to assess training subgroup contribution disparity, showing that unequal contributions from different subgroups are one source of such unfairness. One way to balance the contribution of each subgroup is through oversampling, which ensures that an equal number of samples are drawn from each subgroup during each training iteration. However, we have found that even with a balanced number of samples, the contribution of each group remains unequal, resulting in unfairness under the oversampling strategy. To address the above issues, we propose an easy but effective group contribution matching (GCM) method to match the contribution of each subgroup. Our experiments show that our GCM effectively improves fairness and outperforms other methods significantly.
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
1327
Diagram Visual Grounding: Learning to See with Gestalt-Perceptual Attention
[+] More 
[-] Less 
Diagram visual grounding aims to capture the correlation between language expression and local objects in the diagram, and plays an important role in the applications like textbook question answering and cross-modal retrieval. Most diagrams consist of several colors and simple geometries. This results in sparse low-level visual features, which further aggravates the gap between low-level visual and high-level semantic features of diagrams. The phenomenon brings challenges to the diagram visual grounding. To solve the above issues, we propose a gestalt-perceptual attention model to align the diagram objects and language expressions. For low-level visual features, inspired by the gestalt that simulates human visual system, we build a gestalt-perception graph network to make up the features learned by the traditional backbone network. For high-level semantic features, we design a multi-modal context attention mechanism to facilitate the interaction between diagrams and language expressions, so as to enhance the semantics of diagrams. Finally, guided by diagram features and linguistic embedding, the target query is gradually decoded to generate the coordinates of the referred object. By conducting comprehensive experiments on diagrams and natural images, we demonstrate that the proposed model achieves superior performance over the competitors. Our code will be released at https://github.com/AIProCode/GPA.
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
List of keywords 
Computer Vision -> CV: Vision and language  Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Representation learning
1340
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
[+] More 
[-] Less 
With the development of Vision-Language Pre-training Models (VLPMs) represented by CLIP and ALIGN, significant breakthroughs have been achieved for association-based visual tasks such as image classification and image-text retrieval by the zero-shot capability of CLIP without fine-tuning. However, CLIP is hard to apply to generation-based tasks. This is due to the lack of decoder architecture and pre-training tasks for generation. Although previous works have created generation capacity for CLIP through additional language models, a modality gap between the CLIP representations of different modalities and the inability of CLIP to model the offset of this gap, which results in the failure of the concept to transfer across modes. To solve the problem, we try to map images/videos to the language modality and generate captions from the language modality. In this paper, we propose the K-nearest-neighbor Cross-modality Mapping (Knight), a zero-shot method from association to generation. With vision-free unsupervised training, Knight achieves state-of-the-art performance in zero-shot methods for image captioning and video captioning.
Computer Vision -> CV: Vision and language
Natural Language Processing -> NLP: Language generation
List of keywords 
Machine Learning -> ML: Multi-modal learning Computer Vision -> CV: Vision and language
Natural Language Processing -> NLP: Language generation
1361
HDFormer: High-order Directed Transformer for 3D Human Pose Estimation
[+] More 
[-] Less 
Human pose estimation is a challenging task due to its structured data sequence nature. Existing methods primarily focus on pair-wise interaction of body joints, which is insufficient for scenarios involving overlapping joints and rapidly changing poses. To overcome these issues, we introduce a novel approach, the High-order Directed Transformer (HDFormer), which leverages high-order bone and joint relationships for improved pose estimation. Specifically, HDFormer incorporates both self-attention and high-order attention to formulate a multi-order attention module. This module facilitates first-order "joint-joint", second-order "bone-joint", and high-order "hyperbone-joint" interactions, effectively addressing issues in complex and occlusion-heavy situations. In addition, modern CNN techniques are integrated into the transformer-based architecture, balancing the trade-off between performance and efficiency. HDFormer significantly outperforms state-of-the-art (SOTA) models on Human3.6M and MPI-INF-3DHP datasets, requiring only 1/10 of the parameters and significantly lower computational costs. Moreover, HDFormer demonstrates broad real-world applicability, enabling real-time, accurate 3D pose estimation. The source code is in https://github.com/hyer/HDFormer.
Computer Vision -> CV: Video analysis and understanding
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Video analysis and understanding
1363
Artificial Agents Inspired by Human Motivation Psychology for Teamwork in Hazardous Environments
[+] More 
[-] Less 
Multi-agent literature explores personifying artificial agents with personality, emotions or cognitive biases to produce “typical”, believable agents. In
this study, we demonstrate the potential of endowing artificial agents with a motivation, using human implicit motivation psychology theory that introduces 3 motive profiles – power, achievement and affiliation, to create diverse, risk-aware agents. We first devise a framework to model these motivated agents (or agents with any inherent behavior), that can activate different strategies depending on the circumstances. We conduct experiments on a fire-fighting task domain, evaluate how motivated teams perform, and draw conclusions on appropriate team compositions to be deployed in environments with different risk levels. Our framework generates predictable agents as their resulting behaviors align with the inherent characteristics of their motives. We find that motivational diversity within teams is beneficial in dynamic collaborative environments, especially as the task risk level increases. Furthermore, we observed that the best composition in terms of the performance metrics used to evaluate team compositions, does not remain the same as the collaboration level required to achieve goals changes. These results have implications for future designs of risk-aware autonomous teams and Human-AI teams, as they highlight the prospects of creating better artificial teammates and performance gains that could be achieved through anthropomorphized motivated agents.
Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Human-AI collaboration
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence Humans and AI -> HAI: Cognitive modeling
Humans and AI -> HAI: Human-AI collaboration
1378
Differentiable Model Selection for Ensemble Learning
[+] More 
[-] Less 
Model selection is a strategy aimed at creating accurate and robust models by identifying the optimal model for classifying any particular input sample. This paper proposes a novel framework for differentiable selection of groups of models by integrating machine learning and combinatorial optimization.
The framework is tailored for ensemble learning with a strategy that learns to combine the predictions of appropriately selected pre-trained ensemble models. It does so by modeling the ensemble learning task as a differentiable selection program trained end-to-end over a pretrained ensemble to optimize task performance. The proposed framework demonstrates its versatility and effectiveness, outperforming conventional and advanced consensus rules across a variety of classification tasks.
Machine Learning -> ML: Applications
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint optimization Machine Learning -> ML: Applications
1379
Backpropagation of Unrolled Solvers with Folded Optimization
[+] More 
[-] Less 
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. 
A central challenge in this setting is backpropagation through the solution of an optimization problem, which typically lacks a closed form. One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver. While flexible and general, unrolling can encounter accuracy and efficiency issues in practice. These issues can be avoided by analytical differentiation of the optimization, but current frameworks impose rigid requirements on the optimization problem’s form. This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating  efficiently solvable analytical models of backpropagation. Additionally, it proposes a unifying view of unrolling and analytical differentiation through optimization mappings. Experiments over various model-based learning tasks demonstrate the advantages of the approach both computationally and in terms of enhanced expressiveness.
Machine Learning -> ML: Applications
Machine Learning -> ML: Optimization
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint optimization Machine Learning -> ML: Applications
Machine Learning -> ML: Optimization
1384
Relation-enhanced DETR for Component Detection in Graphic Design Reverse Engineering
[+] More 
[-] Less 
It is a common practice for designers to create digital prototypes from a mock-up/screenshot. Reverse engineering graphic design by detecting its components (e.g., text, icon, button) helps expedite this process. This paper first conducts a statistical analysis to emphasize the importance of relations in graphic layouts, which further motivates us to incorporate relation modeling into component detection.  Built on the current state-of-the-art DETR (DEtection TRansformer), we introduce a learnable relation matrix to model class correlations. Specifically, the matrix will be added in the DETR decoder to update the query-to-query self-attention. Experiment results on three public datasets show that our approach achieves better performance than several strong baselines. We further visualize the learnt relation matrix and observe some reasonable patterns. Moreover, we show an application of component detection where we leverage the detection outputs as augmented training data for layout generation, which achieves promising results.
Computer Vision -> CV: Applications
Computer Vision -> CV: Recognition (object detection, categorization)
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Arts and creativity Computer Vision -> CV: Applications
Computer Vision -> CV: Recognition (object detection, categorization)
1390
Reducing Communication for Split Learning by Randomized Top-k Sparsification
[+] More 
[-] Less 
Split learning is a simple solution for Vertical Federated Learning (VFL), which has drawn substantial attention in both research and application due to its simplicity and efficiency. However, communication efficiency is still a crucial issue for split learning. In this paper, we investigate multiple communication reduction methods for split learning, including cut layer size reduction, top-k sparsification, quantization, and L1 regularization. Through analysis of the cut layer size reduction and top-k sparsification, we further propose randomized top-k sparsification, to make the model generalize and converge better. This is done by selecting top-k elements with a large probability while also having a small probability to select non-top-k elements. Empirical results show that compared with other communication-reduction methods, our proposed randomized top-k sparsification achieves a better model performance under the same compression level.
Machine Learning -> ML: Learning sparse models
Multidisciplinary Topics and Applications -> MDA: Security and privacy
List of keywords 
Machine Learning -> ML: Federated learning Machine Learning -> ML: Learning sparse models
Multidisciplinary Topics and Applications -> MDA: Security and privacy
1412
Towards Generalizable Reinforcement Learning for Trade Execution
[+] More 
[-] Less 
Optimized trade execution is to sell (or buy) a given amount of assets in a given time with the lowest possible trading cost. Recently, reinforcement learning (RL) has been applied to optimized trade execution to learn smarter policies from market data. However, we find that many existing RL methods exhibit considerable overfitting which prevents them from real deployment. In this paper, we provide an extensive study on the overfitting problem in optimized trade execution. First, we model the optimized trade execution as offline RL with dynamic context (ORDC), where the context represents market variables that cannot be influenced by the trading policy and are collected in an offline manner.  Under this framework, we derive the generalization bound and find that the overfitting issue is caused by large context space and limited context samples in the offline setting. Accordingly, we propose to learn compact representations for context to address the overfitting problem, either by leveraging prior knowledge or in an end-to-end manner. To evaluate our algorithms, we also implement a carefully designed simulator based on historical limit order book (LOB) data to provide a high-fidelity benchmark for different algorithms. Our experiments on the high-fidelity simulator demonstrate that our algorithms can effectively alleviate overfitting and achieve better performance.
Machine Learning -> ML: Deep reinforcement learning
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Finance Machine Learning -> ML: Deep reinforcement learning
1424
Dynamic Group Link Prediction in Continuous-Time Interaction Network
[+] More 
[-] Less 
Recently, group link prediction has received increasing attention due to its important role in analyzing relationships between individuals and groups. However, most existing group link prediction methods emphasize static settings or only make cursory exploitation of historical information, so they fail to obtain good performance in dynamic applications. To this end, we attempt to solve the group link prediction problem in continuous-time dynamic scenes with fine-grained temporal information. We propose a novel continuous-time group link prediction method CTGLP to capture the patterns of future link formation between individuals and groups. A new graph neural network CTGNN is presented to learn the latent representations of individuals by biasedly aggregating neighborhood information. Moreover, we design an importance-based group modeling function to model the embedding of a group based on its known members. CTGLP eventually learns a probability distribution and predicts the link target. Experimental results on various datasets with and without unseen nodes show that CTGLP outperforms the state-of-the-art methods by 13.4% and 13.2% on average.
Data Mining -> DM: Networks
Multidisciplinary Topics and Applications -> MDA: Social sciences
List of keywords 
Data Mining -> DM: Mining graphs Data Mining -> DM: Networks
Multidisciplinary Topics and Applications -> MDA: Social sciences
1426
Contrastive Learning and Reward Smoothing for Deep Portfolio Management
[+] More 
[-] Less 
In this study, we used reinforcement learning (RL) models to invest assets in order to earn returns. The models were trained to interact with a simulated environment based on historical market data and learn trading strategies. However, using deep neural networks based on the returns of each period can be challenging due to the unpredictability of financial markets. As a result, the policies learned from training data may not be effective when tested in real-world situations. To address this issue, we incorporated contrastive learning and reward smoothing into our training process. Contrastive learning allows the RL models to recognize patterns in asset states that may indicate future price movements. Reward smoothing, on the other hand, serves as a regularization technique to prevent the models from seeking immediate but uncertain profits. We tested our method against various traditional financial techniques and other deep RL methods, and found it to be effective in both the U.S. stock market and the cryptocurrency market. Our source code is available at https://github.com/sophialien/FinTech-DPM.
Machine Learning -> ML: Relational learning
Machine Learning -> ML: Representation learning
List of keywords 
Machine Learning -> ML: Reinforcement learning Machine Learning -> ML: Relational learning
Machine Learning -> ML: Representation learning
1441
ViT-P3DE∗: Vision Transformer Based Multi-Camera Instance Association with Pseudo 3D Position Embeddings
[+] More 
[-] Less 
Multi-camera instance association, which identifies identical objects among multiple objects in multi-view images, is challenging due to several harsh constraints. To tackle this problem, most studies have employed CNNs as feature extractors but often fail under such harsh constraints. Inspired by Vision Transformer (ViT), we first develop a pure ViT-based framework for robust feature extraction through self-attention and residual connection. We then propose two novel methods to achieve robust feature learning. First, we introduce learnable pseudo 3D position embeddings (P3DEs) that represent the 3D location of an object in the world coordinate system, which is independent of the harsh constraints. To generate P3DEs, we encode the camera ID and the object’s 2D position in the image using embedding tables. We then build a framework that trains P3DEs to represent an object’s 3D position in a weakly supervised manner. Second, we also utilize joint patch generation (JPG). During patch generation, JPG considers an object and its surroundings as a single input patch to reinforce the relationship information between two features. Ultimately, experimental results demonstrate that both ViT-P3DE and ViT-P3DE with JPG achieve state-of-the-art performance and significantly outperform existing works, especially when dealing with extremely harsh constraints.
Computer Vision -> CV: Recognition (object detection, categorization)
List of keywords 
Computer Vision -> CV: Applications Computer Vision -> CV: Recognition (object detection, categorization)
1442
Black-Box Data Poisoning Attacks on Crowdsourcing
[+] More 
[-] Less 
Understanding the vulnerability of label aggregation against data poisoning attacks is key to ensuring data quality in crowdsourced label collection. State-of-the-art attack mechanisms generally assume full knowledge of the aggregation models while failing to consider the flexibility of malicious workers in selecting which instances to label. Such a setup limits the applicability of the attack mechanisms and impedes further improvement of their success rate. This paper introduces a black-box data poisoning attack framework that finds the optimal strategies for instance selection and labeling to attack unknown label aggregation models in crowdsourcing. We formulate the attack problem on top of a generic formalization of label aggregation models and then introduce a substitution approach that attacks a substitute aggregation model in replacement of the unknown model. Through extensive validation on multiple real-world datasets, we demonstrate the effectiveness of both instance selection and model substitution in improving the success rate of attacks.
Humans and AI -> HAI: Human computation and crowdsourcing
Machine Learning -> ML: Robustness
List of keywords 
Humans and AI -> HAI: Human-AI collaboration Humans and AI -> HAI: Human computation and crowdsourcing
Machine Learning -> ML: Robustness
1456
Bi-level Dynamic Learning  for Jointly Multi-modality Image Fusion and Beyond
[+] More 
[-] Less 
Recently, multi-modality scene perception tasks, e.g.,  image fusion and scene understanding, have attracted widespread attention for intelligent vision systems. However, early efforts always consider boosting a single task unilaterally and neglecting others, seldom investigating their underlying connections for joint promotion. To overcome these limitations, we establish the hierarchical dual tasks-driven deep model to bridge these tasks. Concretely, we firstly construct an image fusion module to fuse complementary characteristics and cascade dual task-related modules, including a discriminator for visual effects and a semantic network for feature measurement. 
We provide a  bi-level perspective to formulate image fusion and follow-up downstream tasks. To incorporate distinct task-related responses for image fusion, we consider image fusion as a primary goal and dual modules as learnable constraints. Furthermore, we develop an efficient first-order approximation to compute corresponding gradients and present dynamic weighted aggregation to balance the gradients for fusion learning. Extensive experiments demonstrate the superiority of our method, which not only produces visually pleasant fused results but also realizes significant promotion for detection and segmentation than the state-of-the-art approaches.
Computer Vision -> CV: Scene analysis and understanding
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Scene analysis and understanding
1461
SWAT: Spatial Structure Within and Among Tokens
[+] More 
[-] Less 
Modeling visual data as tokens (i.e., image patches) using attention mechanisms, feed-forward networks or convolutions has been highly effective in recent years. Such methods usually have a common pipeline: a tokenization method, followed by a set of layers/blocks for information mixing, both within and among tokens. When image patches are converted into tokens, they are often flattened, discarding the spatial structure within each patch. As a result, any processing that follows (eg: multi-head self-attention) may fail to recover and/or benefit from such information. In this paper, we argue that models can have significant gains when spatial structure is preserved during tokenization, and is explicitly used during the mixing stage. We propose two key contributions: (1) Structure-aware Tokenization and, (2) Structure-aware Mixing, both of which can be combined with existing models with minimal effort. We introduce a family of models (SWAT), showing improvements over the likes of DeiT, MLP-Mixer and Swin Transformer, across multiple benchmarks including ImageNet classification and ADE20K segmentation. Our code is available at github.com/kkahatapitiya/SWAT.
Computer Vision -> CV: Representation learning
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Representation learning
1476
Multi-objective Optimization-based Selection for Quality-Diversity by Non-surrounded-dominated Sorting
[+] More 
[-] Less 
Quality-Diversity (QD) algorithms, a subset of evolutionary algorithms, maintain an archive (i.e., a set of solutions) and simulate the natural evolution process through iterative selection and reproduction, with the goal of generating a set of high-quality and diverse solutions. Though having found many successful applications in reinforcement learning, QD algorithms often select the parent solutions uniformly at random, which lacks selection pressure and may limit the performance. Recent studies have treated each type of behavior of a solution as an objective, and selected the parent solutions based on Multi-objective Optimization (MO), which is a natural idea, but has not lead to satisfactory performance as expected. This paper gives the reason for the first time, and then proposes a new MO-based selection method by non-surrounded-dominated sorting (NSS), which considers all possible directions of the behaviors, and thus can generate diverse solutions over the whole behavior space. By combining NSS with the most widespread QD algorithm, MAP-Elites, we perform experiments on synthetic functions and several complex tasks (i.e., QDGym, robotic arm, and Mario environment generation), showing that NSS achieves better performance than not only other MO-based selection methods but also state-of-the-art selection methods in QD.
Machine Learning -> ML: Reinforcement learning
Search -> S: Evolutionary computation
List of keywords 
Machine Learning -> ML: Evolutionary learning Machine Learning -> ML: Reinforcement learning
Search -> S: Evolutionary computation
1479
Stability and Generalization of lp-Regularized Stochastic Learning for GCN
[+] More 
[-] Less 
Graph convolutional networks (GCN) are viewed as one of the most popular representations among the variants of graph neural networks over graph data and have shown powerful performance in empirical experiments. That l2-based graph smoothing enforces the global smoothness of GCN, while (soft) l1-based sparse graph learning tends to promote signal sparsity to trade for discontinuity. This paper aims to quantify the trade-off of GCN between smoothness and sparsity, with the help of a general lp-regularized (1<p<= 2) stochastic learning proposed within. While stability-based generalization analyses have been given in prior work for a second derivative objectiveness function, our lp-regularized learning scheme does not satisfy such a smooth condition. To tackle this issue, we propose a novel SGD proximal algorithm for GCNs with an inexact operator. For a single-layer GCN, we establish an explicit theoretical understanding of GCN with the lp-regularized stochastic learning by analyzing the stability of our SGD proximal algorithm. We conduct multiple empirical experiments to validate our theoretical findings.
List of keywords 
Uncertainty in AI -> UAI: Graphical models 1493
On the Reuse Bias in Off-Policy Reinforcement Learning
[+] More 
[-] Less 
Importance sampling (IS) is a popular technique in off-policy evaluation, which re-weights the return of trajectories in the replay buffer to boost sample efficiency. However, training with IS can be unstable and previous attempts to address this issue mainly focus on analyzing the variance of IS. In this paper, we reveal that the instability is also related to a new notion of Reuse Bias of IS — the bias in off-policy evaluation caused by the reuse of the replay buffer for evaluation and optimization. We theoretically show that the off-policy evaluation and optimization of the current policy with the data from the replay buffer result in an overestimation of the objective, which may cause an erroneous gradient update and degenerate the performance. We further provide a high-probability upper bound of the Reuse Bias and show that controlling one term of the upper bound can control the Reuse Bias by introducing the concept of stability for off-policy algorithms. Based on these analyses, we present a novel yet simple Bias-Regularized Importance Sampling (BIRIS) framework along with practical algorithms, which can alleviate the negative impact of the Reuse Bias, and show that our BIRIS can significantly reduce the Reuse Bias empirically. Moreover, extensive experimental results show that our BIRIS-based methods can significantly improve the sample efficiency on a series of continuous control tasks in MuJoCo.
Machine Learning -> ML: Deep reinforcement learning
Planning and Scheduling -> PS: Markov decisions processes
List of keywords 
Machine Learning -> ML: Reinforcement learning Machine Learning -> ML: Deep reinforcement learning
Planning and Scheduling -> PS: Markov decisions processes
1505
Clustered-patch Element Connection for Few-shot Learning
[+] More 
[-] Less 
Weak feature representation problem has influenced the performance of few-shot classification task for a long time. To alleviate this problem, recent researchers build connections between support and query instances through embedding patch features to generate discriminative representations. However, we observe that there exists semantic mismatches (foreground/ background) among these local patches, because the location and size of the target object are not fixed. What is worse, these mismatches result in unreliable similarity confidences, and complex dense connection exacerbates the problem. According to this, we propose a novel Clustered-patch Element Connection (CEC) layer to correct the mismatch problem. The CEC layer leverages Patch Cluster and Element Connection operations to collect and establish reliable connections with high similarity patch features, respectively. Moreover, we propose a CECNet, including CEC layer based attention module and distance metric. The former is utilized to generate a more discriminative representation benefiting from the global clustered-patch features, and the latter is introduced to reliably measure the similarity between pair-features. Extensive experiments demonstrate that our CECNet outperforms the state-of-the-art methods on classification benchmark. Furthermore, our CEC approach can be extended into few-shot segmentation and detection tasks, which achieves competitive performances.
List of keywords 
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning    1510
Dual-view Correlation Hybrid Attention Network for Robust Holistic Mammogram Classification
[+] More 
[-] Less 
Mammogram image is important for breast cancer screening, and typically obtained in a dual-view form, i.e., cranio-caudal (CC) and mediolateral oblique (MLO), to provide complementary information for clinical decisions. However, previous methods mostly learn features from the two views independently, which violates the clinical knowledge and ignores the importance of dual-view correlation in the feature learning. In this paper, we propose a dual-view correlation hybrid attention network (DCHA-Net) for robust holistic mammogram classification. Specifically, DCHA-Net is carefully designed to extract and reinvent deep feature maps for the two views, and meanwhile to maximize the underlying correlations between them. A hybrid attention module, consisting of local relation and non-local attention blocks, is proposed to alleviate the spatial misalignment of the paired views in the correlation maximization. A dual-view correlation loss is introduced to maximize the feature similarity between corresponding strip-like regions with equal distance to the chest wall, motivated by the fact that their features represent the same breast tissues, and thus should be highly-correlated with each other. Experimental results on the two public datasets, i.e., INbreast and CBIS-DDSM, demonstrate that the DCHA-Net can well preserve and maximize feature correlations across views, and thus outperforms previous state-of-the-art methods for classifying a whole mammogram as malignant or not.
Machine Learning -> ML: Knowledge-aided learning
Machine Learning -> ML: Multi-view learning
List of keywords 
Computer Vision -> CV: Biomedical image analysis Machine Learning -> ML: Knowledge-aided learning
Machine Learning -> ML: Multi-view learning
1529
Image Composition with Depth Registration
[+] More 
[-] Less 
Handling occlusions is still a challenging problem for image composition. It always requires the source contents to be completely in front of the target contents or needs manual interventions to adjust occlusions, which is very tedious. Though several methods have suggested exploiting priors or learning techniques for promoting occlusion determination, their potentials are much limited. This paper addresses the challenge by presenting a depth registration method for merging the source contents seamlessly into the 3D space that the target image represents. Thus, the occlusions between the source contents and target contents can be conveniently handled through pixel-wise depth comparisons, allowing the user to more efficiently focus on the designs for image composition. Experimental results show that we can conveniently handle occlusions in image composition and improve efficiency by about 4 times compared to Photoshop.
Multidisciplinary Topics and Applications -> MDA: Arts and creativity
List of keywords 
Computer Vision -> CV: Scene analysis and understanding    Multidisciplinary Topics and Applications -> MDA: Arts and creativity
1539
IID-GAN: an IID Sampling Perspective for Regularizing Mode Collapse
[+] More 
[-] Less 
Despite its success, generative adversarial networks (GANs) still suffer from mode collapse, i.e., the generator can only map latent variables to a partial set of modes in the target distribution. In this paper, we analyze and seek to regularize this issue with an independent and identically distributed (IID) sampling perspective and emphasize that holding the IID property referring to the target distribution for generation can naturally avoid mode collapse. This is based on the basic IID assumption for real data in machine learning. However, though the source samples {z} obey IID, the generations {G(z)} may not necessarily be IID sampling from the target distribution. Based on this observation, considering a necessary condition of IID generation, we propose a new loss to encourage the closeness between the inverse samples of real data and the Gaussian source in the latent space to regularize the generation to be IID from the target distribution. The logic is that the inverse samples from target data should also be IID in the source distribution. Experiments on both synthetic and real-world data show the effectiveness of our model.
Computer Vision -> CV: Neural generative models, auto encoders, GANs
List of keywords 
Machine Learning -> ML: Generative adverserial networks Computer Vision -> CV: Neural generative models, auto encoders, GANs
1540
KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot Node Classification
[+] More 
[-] Less 
Recently, Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis. This task aims to predict nodes from unseen classes which are unobserved in the training process. Existing work mainly utilizes Graph Neural Networks (GNNs) to associate features’ prototypes and labels’ semantics thus enabling knowledge transfer from seen to unseen classes. However, the multi-faceted semantic orientation in the feature-semantic alignment has been neglected by previous work, i.e. the content of a node usually covers diverse topics that are relevant to the semantics of multiple labels. It’s necessary to separate and judge the semantic factors that tremendously affect the cognitive ability to improve the generality of models. To this end, we propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics via the extracted KG (Knowledge Graph)-based topics. And then the content of each node is reconstructed to a topic-level representation that offers multi-faceted and fine-grained semantic relevancy to different labels. Due to the particularity of the graph’s instance (i.e., node) representation, a novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation. Finally, we conduct extensive experiments on several public graph datasets and design an application of zero-shot cross-domain recommendation. The quantitative results demonstrate both the effectiveness and generalization of KMF with the comparison of state-of-the-art baselines.
Data Mining -> DM: Knowledge graphs and knowledge base completion
Data Mining -> DM: Mining graphs
List of keywords 
Data Mining -> DM: Applications Data Mining -> DM: Knowledge graphs and knowledge base completion
Data Mining -> DM: Mining graphs
1542
Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration For Video Captioning
[+] More 
[-] Less 
Fine-tuning large vision-language models is a challenging task. Prompt tuning approaches have been introduced to learn fixed textual or visual prompts while freezing the pre-trained model in downstream tasks. Despite the effectiveness of prompt tuning, what do those learnable prompts learn remains unexplained. In this work, we explore whether prompts in the fine-tuning can learn knowledge-aware prompts from the pre-training, by designing two different sets of prompts in pre-training and fine-tuning phases respectively. Specifically, we present a Video-Language Prompt tuning (VL-Prompt) approach for video captioning, which first efficiently pre-train a video-language model to extract key information (e.g., actions and objects) with flexibly generated Knowledge-Aware Prompt (KAP). Then, we design a Video-Language Prompt (VLP) to transfer the knowledge from the knowledge-aware prompts and fine-tune the model to generate full captions. Experimental results show the superior performance of our approach over several state-of-the-art baselines. We further demonstrate that the video-language prompts are well learned from the knowledge-aware prompts.
Computer Vision -> CV: Vision and language
List of keywords 
Computer Vision -> CV: Video analysis and understanding    Computer Vision -> CV: Vision and language
1560
Abstraction of Nondeterministic Situation Calculus Action Theories
[+] More 
[-] Less 
We develop a general framework for abstracting the behavior of an agent that operates in a nondeterministic domain, i.e., where the agent does not control
the outcome of the nondeterministic actions, based on the nondeterministic situation calculus and the ConGolog programming language. We assume that
we have both an abstract and a concrete nondeterministic basic action theory, and a refinement mapping which  specifies how abstract actions, decomposed into agent actions and environment reactions, are implemented by concrete ConGolog programs. This new setting supports strategic reasoning and strategy synthesis, by allowing us to quantify separately on agent actions and environment reactions. We show that if the agent has a (strong FOND) plan/strategy to achieve a goal/complete a task at the abstract level, and it can always execute the nondeterministic abstract actions to completion at the concrete level, then there exist a refinement of it that is a (strong FOND) plan/strategy to achieve the refinement of the goal/task at the concrete level.
Agent-based and Multi-agent Systems -> MAS: Agent theories and models
List of keywords 
Knowledge Representation and Reasoning -> KRR: Reasoning about actions Agent-based and Multi-agent Systems -> MAS: Agent theories and models
1572
Overlooked Implications of the Reconstruction Loss for VAE Disentanglement
[+] More 
[-] Less 
Learning disentangled representations with variational autoencoders (VAEs) is often attributed to the regularisation component of the loss. In this work, we highlight the interaction between data and the reconstruction term of the loss as the main contributor to disentanglement in VAEs. We show that standard benchmark datasets have unintended correlations between their subjective ground-truth factors and perceived axes in the data according to typical VAE reconstruction losses. Our work exploits this relationship to provide a theory for what constitutes an adversarial dataset under a given reconstruction loss. We verify this by constructing an example dataset that prevents disentanglement in state-of-the-art frameworks while maintaining human-intuitive ground-truth factors. Finally, we re-enable disentanglement by designing an example reconstruction loss that is once again able to perceive the ground-truth factors. Our findings demonstrate the subjective nature of disentanglement and the importance of considering the interaction between the ground-truth factors, data and notably, the reconstruction loss, which is under-recognised in the literature.
Machine Learning -> ML: Autoencoders
Machine Learning -> ML: Unsupervised learning
Machine Learning -> ML: Explainable/Interpretable machine learning
List of keywords 
Machine Learning -> ML: Representation learning Machine Learning -> ML: Autoencoders
Machine Learning -> ML: Unsupervised learning
Machine Learning -> ML: Explainable/Interpretable machine learning
1580
On Conditional and Compositional Language Model Differentiable Prompting
[+] More 
[-] Less 
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks. Prompts can be represented by a human-engineered word sequence or by a learned continuous embedding. 
In this work, we investigate conditional and compositional differentiable prompting.
We propose a new model, Prompt Production System (ProPS), which learns to transform task instructions or input metadata, into continuous prompts that elicit task-specific outputs from the PLM. 
Our model uses a modular network structure based on our neural formulation of Production Systems, which allows the model to learn discrete rules — neural functions that learn to specialize in transforming particular prompt input patterns, making it suitable for compositional transfer learning and few-shot learning. 
We present extensive empirical and theoretical analysis and show that ProPS consistently surpasses other PLM adaptation techniques, and often improves upon fully fine-tuned models, on compositional generalization tasks, controllable summarization and multilingual translation, while needing fewer trainable parameters.
Machine Learning -> ML: Neuro-symbolic methods
List of keywords 
Machine Learning -> ML: Multi-task and transfer learning Machine Learning -> ML: Neuro-symbolic methods
1585
Domain-Adaptive Self-Supervised Face & Body Detection in Drawings
[+] More 
[-] Less 
Drawings are powerful means of pictorial abstraction and communication. Understanding diverse forms of drawings, including digital arts, cartoons, and comics, has been a major problem of interest for the computer vision and computer graphics communities. Although there are large amounts of digitized drawings from comic books and cartoons, they contain vast stylistic variations, which necessitate expensive manual labeling for training domain-specific recognizers. In this work, we show how self-supervised learning, based on a teacher-student network with a modified student network update design, can be used to build face and body detectors. Our setup allows exploiting large amounts of unlabeled data from the target domain when labels are provided for only a small subset of it. We further demonstrate that style transfer can be incorporated into our learning pipeline to bootstrap detectors using a vast amount of out-of-domain labeled images from natural images (i.e., images from the real world). Our combined architecture yields detectors with state-of-the-art (SOTA) and near-SOTA performance using minimal annotation effort. Our code can be accessed from https://github.com/barisbatuhan/DASS_Detector.
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Self-supervised Learning
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
Machine Learning -> ML: Self-supervised Learning
1588
Accurate MRI Reconstruction via Multi-Domain Recurrent Networks
[+] More 
[-] Less 
In recent years, deep convolutional neural networks (CNNs) have become dominant in MRI reconstruction from undersampled k-space. However, most existing CNNs methods reconstruct the undersampled images either in the spatial domain or in the frequency domain, and neglecting the correlation between these two domains. This hinders the further reconstruction performance improvement. To tackle this issue, in this work, we propose a new multi-domain recurrent network (MDR-Net) with multi-domain learning (MDL) blocks as its basic units to reconstruct the undersampled MR image progressively. Specifically, the MDL block interactively processes the local spatial features and the global frequency information to facilitate complementary learning, leading to fine-grained features generation. Furthermore, we introduce an effective frequency-based loss to narrow the frequency spectrum gap, compensating for over-smoothness caused by the widely used spatial reconstruction loss. Extensive experiments on public fastMRI datasets demonstrate that our MDR-Net consistently outperforms other competitive methods and is able to provide more details.
Computer Vision -> CV: Applications
List of keywords 
Computer Vision -> CV: Biomedical image analysis Computer Vision -> CV: Applications
1590
A Fast Algorithm for Consistency Checking Partially Ordered Time
[+] More 
[-] Less 
Partially ordered models of time occur naturally in applications where agents/processes cannot perfectly communicate with each other, and can be traced back to the seminal work of Lamport. In this paper we consider the problem of deciding if a (likely incomplete) description of a system of events is consistent, the network consistency problem for the point algebra of partially ordered time (POT). While the classical complexity of this problem has been fully settled, comparably little is known of the fine-grained complexity of POT except that it can
be solved in O*((0.368n)^n) time by enumerating ordered partitions. We construct a much faster algorithm with a run-time bounded by O*((0.26n)^n), which, e.g., is roughly 1000 times faster than the naive enumeration algorithm in a problem with 20 events. This is achieved by a sophisticated enumeration of structures similar to total orders, which are then greedily expanded toward a solution. While similar ideas have been explored earlier for related problems it turns out that the analysis for POT is non-trivial and requires significant new ideas.
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
1593
Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement
[+] More 
[-] Less 
Scene text image super-resolution (STISR), aiming to improve image quality while boosting downstream scene text recognition accuracy, has recently achieved great success. However, most existing methods treat the foreground (character regions) and background (non-character regions) equally in the forward process, and neglect the disturbance from the complex background, thus limiting the performance. To address these issues, in this paper, we propose a novel method LEMMA that explicitly models character regions to produce high-level text-specific guidance for super-resolution. To model the location of characters effectively, we propose the location enhancement module to extract character region features based on the attention map sequence. Besides, we propose the multi-modal alignment module to perform bidirectional visual-semantic alignment to generate high-quality prior guidance, which is then incorporated into the super-resolution branch in an adaptive manner using the proposed adaptive fusion module. Experiments on TextZoom and four scene text recognition benchmarks demonstrate the superiority of our method over other state-of-the-art methods. Code is available at https://github.com/csguoh/LEMMA.
Computer Vision -> CV: Applications
Computer Vision -> CV: Machine learning for vision
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Applications
Computer Vision -> CV: Machine learning for vision
1596
RaSa: Relation and Sensitivity Aware Representation Learning for Text-based Person Search
[+] More 
[-] Less 
Text-based person search aims to retrieve the specified person images given a textual description. The key to tackling such a challenging task is to learn powerful multi-modal representations. Towards this, we propose a Relation and Sensitivity aware representation learning method (RaSa), including two novel tasks: Relation-Aware learning (RA) and Sensitivity-Aware learning (SA). For one thing, existing methods cluster representations of all positive pairs without distinction and overlook the noise problem caused by the weak positive pairs where the text and the paired image have noise correspondences, thus leading to overfitting learning. RA offsets the overfitting risk by introducing a novel positive relation detection task (i.e., learning to distinguish strong and weak positive pairs). For another thing, learning invariant representation under data augmentation (i.e., being insensitive to some transformations) is a general practice for improving representation’s robustness in existing methods. Beyond that, we encourage the representation to perceive the sensitive transformation by SA (i.e., learning to detect the replaced words), thus promoting the representation’s robustness. Experiments demonstrate that RaSa outperforms existing state-of-the-art methods by 6.94%, 4.45% and 15.35% in terms of Rank@1 on CUHK-PEDES, ICFG-PEDES and RSTPReid datasets, respectively. Code is available at: https://github.com/Flame-Chasers/RaSa.
Computer Vision -> CV: Image and video retrieval
Machine Learning -> ML: Multi-modal learning
List of keywords 
Computer Vision -> CV: Vision and language  Computer Vision -> CV: Image and video retrieval
Machine Learning -> ML: Multi-modal learning
1598
Improved Algorithms for Allen’s Interval Algebra by Dynamic Programming with Sublinear Partitioning
[+] More 
[-] Less 
Allen’s interval algebra is one of the most well-known calculi in qualitative temporal reasoning with numerous applications in artificial intelligence. Very recently, there has been a surge of improvements in the fine-grained complexity of NP-hard reasoning tasks in this algebra, which has improved the running time from the naive 2^O(n^2) to O*((1.0615n)^n), and even faster algorithms are known for unit intervals and the case when we a bounded number of overlapping intervals. 
Despite these improvements the best known lower bound is still only 2^o(n) under the exponential-time hypothesis and major improvements in either direction seemingly require fundamental advances in computational complexity. 
In this paper we propose a novel framework for solving NP-hard qualitative reasoning problems which we refer to as dynamic programming with sublinear partitioning. 
 Using this technique we obtain a major improvement of O*((cn/log(n))^n) for Allen’s interval algebra. To demonstrate that the technique is applicable to further problem domains we apply it to a problem in qualitative spatial reasoning, the cardinal direction calculus, and solve it in O*((cn/log(n))^(2n/3)) time. Hence, not only do we significantly advance the state-of-the-art for NP-hard qualitative reasoning problems, but obtain a novel algorithmic technique that is likely applicable to many problems where 2^O(n) time algorithms are unlikely.
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
1603
Error in the Euclidean Preference Model
[+] More 
[-] Less 
Spatial models of preference, in the form of vector embeddings, are learned by many deep learning and multiagent systems, including recommender systems. Often these models are assumed to approximate a Euclidean structure, where an individual prefers alternatives positioned closer to their "ideal point", as measured by the Euclidean metric. However, previous work has shown there are ordinal preference profiles that cannot be represented with this structure if the Euclidean space has two fewer dimensions than there are individuals or alternatives. We extend this result, showing that there are situations in which almost all preference profiles cannot be represented with the Euclidean model, and derive a theoretical lower bound on the expected error when using the Euclidean model to approximate non-Euclidean preference profiles. Our results have implications for the interpretation and use of vector embeddings, because in some cases close approximation of arbitrary, true ordinal relationships can be expected only if the dimensionality of the embeddings is a substantial fraction of the number of entities represented.
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Machine Learning -> ML: Learning preferences or rankings
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Machine Learning -> ML: Learning preferences or rankings
1604
A Fast Adaptive Randomized PCA Algorithm
[+] More 
[-] Less 
It is desirable to adaptively determine the number of dimensions (rank) for PCA according to a given tolerance of low-rank approximation error. In this work, we aim to develop a fast algorithm solving this adaptive PCA problem. We propose to replace the QR factorization in randQB_EI algorithm with matrix multiplication and inversion of small matrices, and propose a new error indicator to incrementally evaluate approximation error in Frobenius norm. Combining the shifted power iteration technique for better accuracy, we finally build up an algorithm named farPCA. Experimental results show that farPCA is much faster than the baseline methods (randQB_EI, randUBV and svds) in practical setting of multi-thread computing, while producing nearly optimal results of adpative PCA.
Data Mining -> DM: Big data and scalability
Data Mining -> DM: Theoretical foundations of data mining
List of keywords 
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction Data Mining -> DM: Big data and scalability
Data Mining -> DM: Theoretical foundations of data mining
1621
Specifying and Testing k-Safety Properties for Machine-Learning Models
[+] More 
[-] Less 
Machine-learning models are becoming increasingly prevalent in our lives, for instance assisting in image-classification or decision-making tasks. Consequently, the reliability of these models is of critical importance and has resulted in the development of numerous approaches for validating and verifying their robustness and fairness. However, beyond such specific properties, it is challenging to specify, let alone check, general functional-correctness expectations from models. In this paper, we take inspiration from specifications used in formal methods, expressing functional-correctness properties by reasoning about k different executions—so-called k-safety properties. Considering a credit-screening model of a bank, the expected property that "if a person is denied a loan and their income decreases, they should still be denied the loan" is a 2-safety property. Here, we show the wide applicability of k-safety properties for machine-learning models and present the first specification language for expressing them. We also operationalize the language in a framework for automatically validating such properties using metamorphic testing. Our experiments show that our framework is effective in identifying property violations, and that detected bugs could be used to train better models.
Agent-based and Multi-agent Systems -> MAS: Engineering methods, platforms, languages and tools
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Software engineering Agent-based and Multi-agent Systems -> MAS: Engineering methods, platforms, languages and tools
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
1624
Hierarchical Transformer for Scalable Graph Learning
[+] More 
[-] Less 
Graph Transformer is gaining increasing attention in the field of machine learning and has demonstrated state-of-the-art performance on benchmarks for graph representation learning. However, as current implementations of Graph Transformer primarily focus on learning representations of small-scale graphs, the quadratic complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs. Additionally, conventional sampling-based methods fail to capture necessary high-level contextual information, resulting in a significant loss of performance. In this paper, we introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges. HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance. By utilizing graph hierarchies constructed through coarsening techniques, HSGT efficiently updates and stores multi-scale information in node embeddings at different levels. Together with sampling-based training methods, HSGT effectively captures and aggregates multi-level information on the hierarchical graph using only Transformer blocks. Empirical evaluations demonstrate that HSGT achieves state-of-the-art performance on large-scale benchmarks with graphs containing millions of nodes with high efficiency.
List of keywords 
Machine Learning -> ML: Sequence and graph learning 1626
PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View
[+] More 
[-] Less 
Accurately perceiving instances and predicting their future motion are key tasks for autonomous vehicles, enabling them to navigate safely in complex urban traffic. While bird’s-eye view (BEV) representations are commonplace in perception for autonomous driving, their potential in a motion prediction setting is less explored. Existing approaches for BEV instance prediction from surround cameras rely on a multi-task auto-regressive setup coupled with complex post-processing to predict future instances in a spatio-temporally consistent manner. In this paper, we depart from this paradigm and propose an efficient novel end-to-end framework named PowerBEV, which differs in several design choices aimed at reducing the inherent redundancy in previous methods. First, rather than predicting the future in an auto-regressive fashion, PowerBEV uses a parallel, multi-scale module built from lightweight 2D convolutional networks. Second, we show that segmentation and centripetal backward flow are sufficient for prediction, simplifying previous multi-task objectives by eliminating redundant output modalities. Building on this output representation, we propose a simple, flow warping-based post-processing approach which produces more stable instance associations across time. Through this lightweight yet powerful design, PowerBEV outperforms state-of-the-art baselines on the NuScenes Dataset and poses an alternative paradigm for BEV instance prediction. We made our code publicly available at: https://github.com/EdwardLeeLPZ/PowerBEV.
Computer Vision -> CV: Segmentation
List of keywords 
Computer Vision -> CV: Motion and tracking Computer Vision -> CV: Segmentation
1630
DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving
[+] More 
[-] Less 
In the realm of autonomous driving, real-time perception or streaming perception remains under-explored. This research introduces DAMO-StreamNet, a novel framework that merges the cutting-edge elements of the YOLO series with a detailed examination of spatial and temporal perception techniques. DAMO-StreamNet’s main inventions include: (1) a robust neck structure employing deformable convolution, bolstering receptive field and feature alignment capabilities; (2) a dual-branch structure synthesizing short-path semantic features and long-path temporal features, enhancing the accuracy of motion state prediction; (3) logits-level distillation facilitating efficient optimization, which aligns the logits of teacher and student networks in semantic space; and (4) a real-time prediction mechanism that updates the features of support frames with the current frame, providing smooth streaming perception during inference. Our testing shows that DAMO-StreamNet surpasses current state-of-the-art methodologies, achieving 37.8% (normal size (600, 960)) and 43.3% (large size (1200, 1920)) sAP without requiring additional data. This study not only establishes a new standard for real-time perception but also offers valuable insights for future research. The source code is at https://github.com/zhiqic/DAMO-StreamNet.
Computer Vision -> CV: Recognition (object detection, categorization)
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Recognition (object detection, categorization)
1633
Differentiable Economics for Randomized Affine Maximizer Auctions
[+] More 
[-] Less 
A recent approach to automated mechanism design, differentiable economics, represents auctions by rich function approximators and optimizes their performance by gradient descent. The ideal auction architecture for differentiable economics would be perfectly strategyproof, support multiple bidders and items, and be rich enough to represent the optimal (i.e. revenue-maximizing) mechanism. So far, such an architecture does not exist. There are single-bidder approaches (MenuNet, RochetNet) which are always strategyproof and can represent optimal mechanisms. RegretNet is multi-bidder and can approximate any mechanism, but is only approximately strategyproof. We present an architecture that supports multiple bidders and is perfectly strategyproof, but cannot necessarily represent the optimal mechanism. This architecture is the classic affine maximizer auction (AMA), modified to offer lotteries. By using the gradient-based optimization tools of differentiable economics, we can now train lottery AMAs, competing with or outperforming prior approaches in revenue.
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Multidisciplinary Topics and Applications -> MDA: Economics
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Mechanism design Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Multidisciplinary Topics and Applications -> MDA: Economics
1635
A Bitwise GAC Algorithm for Alldifferent Constraints
[+] More 
[-] Less 
The generalized arc consistency (GAC) algorithm is the prevailing solution for alldifferent constraint problems. The core part of GAC for alldifferent constraints is excavating and enumerating all the strongly connected components (SCCs) of the graph model. This causes a large amount of complex data structures to maintain the node information, leading to a large overhead both in time and memory space. More critically, the complexity of the data structures further precludes the coordination of different optimization schemes for GAC. To solve this problem, the key observation of this paper is that the GAC algorithm only cares whether a node of the graph model is in an SCC or not, rather than which SCCs it belongs to. Based on this observation, we propose AllDiffbit, which employs bitwise data structures and operations to efficiently determine if a node is in an SCC. This greatly reduces the corresponding overhead, and enhances the ability to incorporate existing optimizations to work in a synergistic way. Our experiments show that AllDiffbit outperforms the state-of-the-art GAC algorithms over 60%.
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint programming Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
1639
Bipolar Abstract Dialectical Frameworks Are Covered by Kleene’s Three-valued Logic
[+] More 
[-] Less 
Abstract dialectical frameworks (ADFs) are one of the most powerful generalizations of classical Dung-style argumentation frameworks (AFs).
The additional expressive power comes with an increase in computational complexity, namely one level up in the polynomial hierarchy in comparison to
their AF counterparts. However, there is one important subclass, so-called bipolar ADFs (BADFs) which are as complex as classical AFs while offering strictly more modeling capacities. This property makes BADFs very attractive from a knowledge representation point of view and is the main reason why this class has received much attention recently. The semantics of ADFs rely on the Gamma-operator which takes as an input a three-valued interpretation and returns a new one. However, in order to obtain the output the original definition requires to consider any two-valued completion of a given three-valued interpretation. In this paper we formally prove that in case of BADFs we may bypass the computationally intensive procedure via applying Kleene’s three-valued logic K. We therefore introduce the so-called bipolar disjunctive normal form which is simply a disjunctive normal form where any used atom possesses either a positive or a negative polarity. We then show that: First, this normal form is expressive enough to represent any BADF and secondly, the computation can be done via Kleene’s K instead of dealing with two-valued completions. Inspired by the main correspondence result we present some first experiments showing the computational benefit of using Kleene.
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning
List of keywords 
Knowledge Representation and Reasoning -> KRR: Argumentation Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning
1654
SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation
[+] More 
[-] Less 
Referring image segmentation aims to segment an object out of an image via a specific language expression. The main concept is establishing global visual-linguistic relationships to locate the object and identify boundaries using details of the image. Recently, various Transformer-based techniques have been proposed to efficiently leverage long-range cross-modal dependencies,  enhancing performance for referring segmentation. However, existing methods consider visual feature extraction and cross-modal fusion separately, resulting in insufficient visual-linguistic alignment in semantic space. In addition, they employ sequential structures and hence lack multi-scale information interaction. To address these limitations, we propose a Scale-Wise Language-Guided Vision Transformer (SLViT) with two appealing designs: (1) Language-Guided Multi-Scale Fusion Attention, a novel attention mechanism module for extracting rich local visual information and modeling global visual-linguistic relationships in an integrated manner. (2) An Uncertain Region Cross-Scale Enhancement module that can identify regions of high uncertainty using linguistic features and refine them via aggregated multi-scale features. We have evaluated our method on three benchmark datasets. The experimental results demonstrate that SLViT surpasses state-of-the-art methods with lower computational cost. The code is publicly available at: https://github.com/NaturalKnight/SLViT.
Computer Vision -> CV: Segmentation
List of keywords 
Computer Vision -> CV: Vision and language  Computer Vision -> CV: Segmentation
1666
Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms
[+] More 
[-] Less 
We consider the problem of navigating in a Markov decision process where extrinsic rewards are either absent or ignored. In this setting, the objective is to learn policies to reach all the states that are reachable within a given number of steps (in expectation) from a starting state. We introduce a novel meta-algorithm which can use any online reinforcement learning algorithm (with appropriate regret guarantees) as a black-box. Our algorithm demonstrates a method for transforming the output of online algorithms to a batch setting. We prove an upper bound on the sample complexity of our algorithm in terms of the regret bound of the used black-box RL algorithm. Furthermore, we provide experimental results to validate the effectiveness of our algorithm and correctness of our theoretical results.
Machine Learning -> ML: Learning theory
List of keywords 
Machine Learning -> ML: Reinforcement learning Machine Learning -> ML: Learning theory
1679
Temporal Datalog with Existential Quantification
[+] More 
[-] Less 
Existential rules, also known as tuple-generating dependencies (TGDs) or Datalog+/- rules, are heavily studied in the communities of  Knowledge Representation and Reasoning, Semantic Web, and Databases, due to their rich modelling capabilities. In this paper we consider TGDs in the temporal setting, by introducing and studying DatalogMTLE—an extension of metric temporal Datalog (DatalogMTL) obtained by allowing for existential rules in  programs.  We show that DatalogMTLE is undecidable even in the restricted cases of guarded and weakly-acyclic programs. To address this issue we introduce uniform semantics which, on the one hand, is well-suited for modelling temporal knowledge as it prevents from unintended value invention and, on the other hand, provides decidability of reasoning; in particular, it becomes 2-EXPSPACE-complete for weakly-acyclic programs but remains undecidable for guarded programs. We provide an implementation for the decidable case  and demonstrate its practical feasibility.
Thus we obtain an expressive, yet decidable,  rule-language and a system which is suitable for complex temporal reasoning with existential rules.
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
List of keywords 
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
1685
Learning Heuristically-Selected and Neurally-Guided Feature for Age Group Recognition Using Unconstrained Smartphone Interaction
[+] More 
[-] Less 
Owing to the boom of smartphone industries, the expansion of phone users has also been significant. Besides adults, children and elders have also begun to join the population of daily smartphone users. Such an expansion indeed facilitates the further exploration of the versatility and flexibility of digitization. However, these new users may also be susceptible to issues such as addiction, fraud, and insufficient accessibility. To fully utilize the capability of mobile devices without breaching personal privacy, we build the first corpus for age group recognition on smartphones with more than 1,445,087 unrestricted actions from 2,100 subjects. Then a series of heuristically-selected and neurally-guided features are proposed to increase the separability of the above dataset. Finally, we develop AgeCare, the first implicit and continuous system incorporated with bottom-to-top functionality without any restriction on user-phone interaction scenarios, for accurate age group recognition and age-tailored assistance on smartphones. Our system performs impressively well on this dataset and significantly surpasses the state-of-the-art methods.
Humans and AI -> HAI: Personalization and user modeling
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
List of keywords 
Humans and AI -> HAI: Human-computer interaction Humans and AI -> HAI: Personalization and user modeling
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction
1698
Unbiased Risk Estimator to Multi-Labeled Complementary Label Learning
[+] More 
[-] Less 
Multi-label learning (MLL) usually requires assigning multiple relevant labels to each instance. While a fully supervised MLL dataset needs a large amount of labeling effort, using complementary labels can help alleviate this burden. However, current approaches to learning from complementary labels are mainly designed for multi-class learning and assume that each instance has a single relevant label. This means that these approaches cannot be easily applied to MLL when only complementary labels are provided, where the number of relevant labels is unknown and can vary across instances. In this paper, we first propose the unbiased risk estimator for the multi-labeled complementary label learning (MLCLL) problem. We also provide an estimation error bound to ensure the convergence of the empirical risk estimator. In some cases, the unbiased estimator may give unbounded gradients for certain loss functions and result in overfitting. To mitigate this problem, we improve the risk estimator by minimizing a proper loss function, which has been shown to improve gradient updates. Our experimental results demonstrate the effectiveness of the proposed approach on various datasets.
Machine Learning -> ML: Multi-label
Machine Learning -> ML: Weakly supervised learning
List of keywords 
Machine Learning -> ML: Classification Machine Learning -> ML: Multi-label
Machine Learning -> ML: Weakly supervised learning
1716
Exploring Leximin Principle for Fair Core-Selecting Combinatorial Auctions: Payment Rule Design and Implementation
[+] More 
[-] Less 
Core-selecting combinatorial auctions (CAs) restrict the auction result in the core such that no coalitions could improve their utilities by engaging in collusion. The minimum-revenue-core (MRC) rule is a widely used core-selecting payment rule to maximize the total utilities of all bidders. However, the MRC rule can suffer from severe unfairness since it ignores individuals’ utilities. To address this limitation, we propose to explore the leximin principle to achieve fairness in core-selecting CAs since the leximin principle prefers to maximize the utility of the worst-off; the resulting bidder-leximin-optimal (BLO) payment rule is then theoretically analyzed and an effective algorithm is further provided to compute the BLO outcome. Moreover, we conduct extensive experiments to show that our algorithm returns fairer utility distributions and is faster than existing algorithms of core-selecting payment rules.
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Computational social choice
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Mechanism design Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Computational social choice
1728
Adaptive Reward Shifting Based on Behavior Proximity for Offline Reinforcement Learning
[+] More 
[-] Less 
One of the major challenges of the current offline reinforcement learning research is to deal with the distribution shift problem due to the change in state-action visitations for the new policy. To address this issue, we present a novel reward shifting-based method. Specifically, to regularize the behavior of the new policy at each state, we modify the reward to be received by the new policy by shifting it adaptively according to its proximity to the behavior policy, and apply the reward shifting along opposite directions for in-distribution actions and the ones not. In this way we are able to guide the learning procedure of the new policy itself by influencing the consequence of its actions explicitly, helping it to achieve a better balance between behavior constraints and policy improvement. Empirical results on the popular D4RL benchmarks show that the proposed method obtains competitive performance compared to the state-of-art baselines.
List of keywords 
Machine Learning -> ML: Reinforcement learning 1736
A Diffusion Model with Contrastive Learning for ICU False Arrhythmia Alarm Reduction
[+] More 
[-] Less 
The high rate of false arrhythmia alarms in intensive care units (ICUs) can negatively impact patient care and lead to slow staff response time due to alarm fatigue. To reduce false alarms in ICUs, previous works proposed conventional supervised learning methods which have inherent limitations in dealing with high-dimensional, sparse, unbalanced, and limited data. We propose a deep generative approach based on the conditional denoising diffusion model to detect false arrhythmia alarms in the ICUs. Conditioning on past waveform data of a patient, our approach generates waveform predictions of the patient during an actual arrhythmia event, and uses the distance between the generated and the observed samples to classify the alarm. We design a network with residual links and self-attention mechanism to capture long-term dependencies in signal sequences, and leverage the contrastive learning mechanism to maximize distances between true and false arrhythmia alarms. We demonstrate the effectiveness of our approach on the MIMIC II arrhythmia dataset for detecting false alarms in both retrospective and real-time settings.
Machine Learning -> ML: Applications
Machine Learning -> ML: Time series and data streams
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Health and medicine Machine Learning -> ML: Applications
Machine Learning -> ML: Time series and data streams
1738
Generative Flow Networks for Precise Reward-Oriented Active Learning on Graphs
[+] More 
[-] Less 
Many score-based active learning methods have been successfully applied to graph-structured data, aiming to reduce the number of labels and achieve better performance of graph neural networks based on predefined score functions. However, these algorithms struggle to learn policy distributions that are proportional to rewards and have limited exploration capabilities. In this paper, we innovatively formulate the graph active learning problem as a generative process, named GFlowGNN, which generates various samples through sequential actions with probabilities precisely proportional to a predefined reward function. Furthermore, we propose the concept of flow nodes and flow features to efficiently model graphs as flows based on generative flow networks, where the policy network is trained with specially designed rewards. Extensive experiments on real datasets show that the proposed approach has good exploration capability and transferability, outperforming various state-of-the-art methods.
Machine Learning -> ML: Active learning
List of keywords 
Machine Learning -> ML: Sequence and graph learning Machine Learning -> ML: Active learning
1748
Deep Partial Multi-Label Learning with Graph Disambiguation
[+] More 
[-] Less 
In partial multi-label learning (PML), each data example is equipped with a candidate label set, which consists of multiple ground-truth labels and other false-positive labels. Recently, graph-based methods, which demonstrate a good ability to estimate accurate confidence scores from candidate labels, have been prevalent to deal with PML problems. However, we observe that existing graph-based PML methods typically adopt linear multi-label classifiers and thus fail to achieve superior performance. In this work, we attempt to remove several obstacles for extending them to deep models and propose a novel deep Partial multi-Label model with grAph-disambIguatioN (PLAIN). Specifically, we introduce the instance-level and label-level similarities to recover label confidences as well as exploit label dependencies. At each training epoch, labels are propagated on the instance and label graphs to produce relatively accurate pseudo-labels; then, we train the deep model to fit the numerical labels. Moreover, we provide a careful analysis of the risk functions to guarantee the robustness of the proposed model. Extensive experiments on various synthetic datasets and three real-world PML datasets demonstrate that PLAIN achieves significantly superior results to state-of-the-art methods.
List of keywords 
Machine Learning -> ML: Multi-label 1749
Learning Calibrated Uncertainties for Domain Shift: A Distributionally Robust Learning Approach
[+] More 
[-] Less 
We propose a framework for learning calibrated uncertainties under domain shifts, considering the case where the source (training) distribution differs from the target (test) distribution. We detect such domain shifts through the use of a differentiable density ratio estimator and train it together with the task network, composing an adjusted softmax predictive form that concerns the domain shift. In particular, the density ratio estimator yields a density ratio that reflects the closeness of a target (test) sample to the source (training) distribution. We employ it to adjust the uncertainty of prediction in the task network. This idea of using the density ratio is based on the distributionally robust learning (DRL) framework, which accounts for the domain shift through adversarial risk minimization. We demonstrate that our proposed method generates calibrated uncertainties that benefit many downstream tasks, such as unsupervised domain adaptation (UDA) and semi-supervised learning (SSL). On these tasks, methods like self-training and FixMatch use uncertainties to select confident pseudo-labels for re-training. Our experiments show that the introduction of DRL leads to significant improvements in cross-domain performance. We also demonstrate that the estimated density ratios show an agreement with the human selection frequencies, suggesting a positive correlation with a proxy of human perceived uncertainties.
Machine Learning -> ML: Classification
Machine Learning -> ML: Multi-task and transfer learning
List of keywords 
Computer Vision -> CV: Machine learning for vision Machine Learning -> ML: Classification
Machine Learning -> ML: Multi-task and transfer learning
1756
Delegated Online Search
[+] More 
[-] Less 
In a delegation problem, a principal P with commitment power tries to pick one out of n options. Each option is drawn independently from a known distribution. Instead of inspecting the options herself, P delegates the information acquisition to a rational and self-interested agent A. After inspection, A proposes one of the options, and P can accept or reject. In this paper, we study a natural online variant of delegation, in which the agent searches through the options in an online fashion. How can we design algorithms for P that approximate the utility of her best option in hindsight?
We show that P can obtain a Θ(1/n)-approximation and provide more fine-grained bounds independent of n based on two parameters. If the ratio of maximum and minimum utility for A is bounded by a factor α, we obtain an Ω(log log α / log α)-approximation algorithm and show that this is best possible. If P cannot distinguish options with the same value for herself, we show that ratios polynomial in 1/α cannot be avoided. If the utilities of P and A for each option are related by a factor β, we obtain an Ω(1 / log β)-approximation, and O(log log β / log β) is best possible.
Agent-based and Multi-agent Systems -> MAS: Agent communication
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Mechanism design Agent-based and Multi-agent Systems -> MAS: Agent communication
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
1758
Progressive Label Propagation for Semi-Supervised Multi-Dimensional Classification
[+] More 
[-] Less 
In multi-dimensional classification (MDC), each training example is associated with multiple class variables from different class spaces. However, it is rather costly to collect labeled MDC examples which have to be annotated from several dimensions (class spaces). To reduce the labeling cost, we attempt to deal with the MDC problem under the semi-supervised learning setting. Accordingly, a novel MDC approach named PLAP is proposed to solve the resulting semi-supervised MDC problem. Overall, PLAP works under the label propagation framework to utilize unlabeled data. To further consider dependencies among class spaces, PLAP deals with each class space in a progressive manner, where the previous propagation results will be used to initialize the current propagation procedure and all processed class spaces and the current one will be regarded as an entirety. Experiments validate the effectiveness of the proposed approach.
Machine Learning -> ML: Multi-label
Machine Learning -> ML: Semi-supervised learning
List of keywords 
Machine Learning -> ML: Classification Machine Learning -> ML: Multi-label
Machine Learning -> ML: Semi-supervised learning
1762
Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training
[+] More 
[-] Less 
Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for both 2D and 3D computer vision. However, existing MAE-style methods can only learn from the data of a single modality, i.e., either images or point clouds, which neglect the implicit semantic and geometric correlation between 2D and 3D. In this paper, we explore how the 2D modality can benefit 3D masked autoencoding, and propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training. Joint-MAE randomly masks an input 3D point cloud and its projected 2D images, and then reconstructs the masked information of the two modalities. For better cross-modal interaction, we construct our JointMAE by two hierarchical 2D-3D embedding modules, a joint encoder, and a joint decoder with modal-shared and model-specific decoders. On top of this, we further introduce two cross-modal strategies to boost the 3D representation learning, which are local-aligned attention mechanisms for 2D-3D semantic cues, and a cross-reconstruction loss for 2D-3D geometric constraints. By our pre-training paradigm, Joint-MAE achieves superior performance on multiple downstream tasks, e.g., 92.4% accuracy for linear SVM on ModelNet40 and 86.07% accuracy on the hardest split of ScanObjectNN.
Computer Vision -> CV: Representation learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Representation learning
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
1763
A New Variable Ordering for In-processing Bounded Variable Elimination in SAT Solvers
[+] More 
[-] Less 
Bounded Variable Elimination (BVE) is an important Boolean formula simplification technique in which the variable ordering is crucial. We define a new variable ordering based on variable activity, called ESA (variable Elimination Scheduled by Activity), for in-processing BVE in Conflict-Driven Clause Learning (CDCL) SAT solvers, and incorporate it into several state-of-the-art CDCL SAT solvers. Experimental results show that the new ESA ordering consistently makes these solvers solve more instances on the benchmark set including all the 5675 instances used in the Crafted, Application and Main tracks of all SAT Competitions up to 2022. In particular, one of these solvers with ESA, Kissat_MAB_ESA, won the Anniversary track of the SAT Competition 2022.  The behaviour of ESA and the reason of its effectiveness are also analyzed.
Constraint Satisfaction and Optimization -> CSO: Solvers and tools
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty Constraint Satisfaction and Optimization -> CSO: Solvers and tools
1778
Learning to Self-Reconfigure for Freeform Modular Robots via Altruism Proximal Policy Optimization
[+] More 
[-] Less 
The advantages of modular robot systems stem from their ability to change between different configurations, enabling them to adapt to complex and dynamic real-world environments. Then, how to perform the accurate and efficient change of the modular robot system, i.e., the self-reconfiguration problem, is essential. Existing reconfiguration algorithms are based on discrete motion primitives and are suitable for lattice-type modular robots. The modules of freeform modular robots are connected without alignment, and the motion space is continuous. It renders existing reconfiguration methods infeasible. In this paper, we design a parallel distributed self-reconfiguration algorithm for freeform modular robots based on multi-agent reinforcement learning to realize the automatic design of conflict-free reconfiguration controllers in continuous action spaces. To avoid conflicts, we incorporate a collaborative mechanism into reinforcement learning. Furthermore, we design the distributed termination criteria to achieve timely termination in the presence of limited communication and local observability. When compared to the baselines, simulations show that the proposed method improves efficiency and congruence, and module movement demonstrates altruism.
Robotics -> ROB: Multi-robot systems
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
List of keywords 
Robotics -> ROB: Learning in robotics Robotics -> ROB: Multi-robot systems
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
1793
Proportionally Fair Online Allocation of Public Goods with Predictions
[+] More 
[-] Less 
We design online algorithms for fair allocation of public goods to a set of N agents over a sequence of T rounds and focus on improving their performance using predictions. In the basic model, a public good arrives in each round, and every agent reveals their value for it upon arrival. The algorithm must irrevocably decide the investment in this good without exceeding a total budget of B across all rounds. The algorithm can utilize (potentially noisy) predictions of each agent’s total value for all remaining goods. The algorithm’s performance is measured using a proportional fairness objective, which informally demands that every group of agents be rewarded proportional to its size and the cohesiveness of its preferences. We show that no algorithm can achieve better than Θ(T/B) proportional fairness without predictions. With reasonably accurate predictions, the situation improves significantly, and Θ(log(T/B)) proportional fairness is achieved. We also extend our results to a general setting wherein a batch of L public goods arrive in each round and O(log(min(N,L)T/B)) proportional fairness is achieved. Our exact bounds are parameterized as a function of the prediction error, with performance degrading gracefully with increasing errors.
Game Theory and Economic Paradigms -> GTEP: Fair division
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Resource allocation Game Theory and Economic Paradigms -> GTEP: Fair division
1796
Basket Representation Learning by Hypergraph Convolution on Repeated Items for Next-basket Recommendation
[+] More 
[-] Less 
Basket representation plays an important role in the task of next-basket recommendation. However, existing methods generally adopts pooling operations to learn a basket’s representation, from which two critical issues can be identified. 
First, they treat a basket as a set of items independent and identically distributed. We find that items occurring in the same basket have much higher correlations than those randomly selected by conducting data analysis on a real dataset. 
Second, although some works have recognized the importance of items repeatedly purchased in multiple baskets, they ignore the correlations among the repeated items in a same basket, whose importance is  shown by our data analysis. In this paper, we propose a novel Basket Representation Learning (BRL) model by leveraging the correlations among intra-basket items. Specifically, we first connect all the items (in a basket) as a hyperedge, where the correlations among different items can be well exploited by hypergraph convolution operations. Meanwhile, we also connect all the repeated items in the same basket as a hyperedge, whereby their correlations can be further strengthened. We generate a negative (positive) view of the basket by data augmentation on repeated (non-repeated) items, and apply contrastive learning to force more agreements on repeated items. Finally, experimental results on three real datasets show that our approach performs better than eight baselines in ranking accuracy.
Data Mining -> DM: Information retrieval
List of keywords 
Data Mining -> DM: Recommender systems Data Mining -> DM: Information retrieval
1798
Multi-Modality Deep Network for JPEG Artifacts Reduction
[+] More 
[-] Less 
In recent years, many convolutional neural network-based models are designed for JPEG artifacts reduction, and have achieved notable progress. However, few methods are suitable for extreme low-bitrate image compression artifacts reduction. The main challenge is that the highly compressed image loses too much information, resulting in reconstructing high-quality image difficultly. To address this issue, we propose a multimodal fusion learning method for text-guided JPEG artifacts reduction, in which the corresponding text description not only provides the potential prior information of the highly compressed image, but also serves as supplementary information to assist in image deblocking. We fuse image features and text semantic features from the global and local perspectives respectively, and design a contrastive loss built upon contrastive learning to produce visually pleasing results. Extensive experiments, including a user study, prove that our method can obtain better deblocking results compared to the state-of-the-art methods.
Computer Vision -> CV: Machine learning for vision
List of keywords 
Machine Learning -> ML: Multi-modal learning Computer Vision -> CV: Machine learning for vision
1805
Totally Dynamic Hypergraph Neural Networks
[+] More 
[-] Less 
Recent dynamic hypergraph neural networks (DHGNNs) are designed to adaptively optimize the hypergraph structure to avoid the dependence on the initial hypergraph structure, thus capturing more hidden information for representation learning. However, most existing DHGNNs cannot adjust the hyperedge number and thus fail to fully explore the underlying hypergraph structure. This paper proposes a new method, namely, totally hypergraph neural network (TDHNN), to adjust the hyperedge number for optimizing the hypergraph structure. Specifically, the proposed method first captures hyperedge feature distribution to obtain dynamical hyperedge features rather than fixed ones, by conducting the sampling from the learned distribution.
The hypergraph is then constructed based on the attention coefficients of both sampled hyperedges and nodes. The node features are dynamically updated by designing a simple hypergraph convolution algorithm. Experimental results on real datasets demonstrate the effectiveness of the proposed method, compared to SOTA methods. The source code can be accessed via https://github.com/HHW-zhou/TDHNN.
Data Mining -> DM: Networks
List of keywords 
Data Mining -> DM: Mining graphs Data Mining -> DM: Networks
1809
Expanding the Hyperbolic Kernels: A Curvature-aware Isometric Embedding View
[+] More 
[-] Less 
Modeling data relation as a hierarchical structure has proven beneficial for many learning scenarios, and the hyperbolic space, with negative curvature, can encode such data hierarchy without distortion. Several recent studies also show that the representation power of the hyperbolic space can be further improved by endowing the kernel methods. Unfortunately, the known kernel methods, developed in hyperbolic space, are limited by the adaptation capacity or distortion issues. This paper addresses the issues through a novel embedding function. To this end, we propose a curvature-aware isometric embedding, which establishes an isometry from the Poincar\’e model to a special reproducing kernel Hilbert space (RKHS). Then we can further define a series of kernels on this RKHS, including several positive definite kernels and an indefinite kernel. Thorough experiments are conducted to demonstrate the superiority of our proposals over existing-known hyperbolic and Euclidean kernels in various learning tasks, e.g., graph learning and zero-shot learning.
Machine Learning -> ML: Geometric learning
List of keywords 
Machine Learning -> ML: Kernel methods Machine Learning -> ML: Geometric learning
1812
Quantifying Harm
[+] More 
[-] Less 
In earlier work we defined a qualitative notion of harm: either harm is caused, or it is not. For practical applications, we often need to quantify harm; for example, we may want to choose the least harmful of a set of possible interventions. We first present a quantitative definition of harm in a deterministic context involving a single individual, then we consider the issues involved in dealing with uncertainty regarding the context and going from a notion of harm for a single individual to a notion of "societal harm", which involves aggregating the harm to individuals.  We show that the "obvious" way of doing this (just taking the expected harm for an individual and then summing the expected harm over all individuals) can lead to counterintuitive or inappropriate answers, and discuss alternatives, drawing on work from the decision-theory literature.
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
Uncertainty in AI -> UAI: Decision and utility theory
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Ethical, legal and societal issues Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
Uncertainty in AI -> UAI: Decision and utility theory
1813
A Dual Semantic-Aware Recurrent Global-Adaptive Network for Vision-and-Language Navigation
[+] More 
[-] Less 
Vision-and-Language Navigation (VLN) is a realistic but challenging task that requires an agent to locate the target region using verbal and visual cues. While significant advancements have been achieved recently, there are still two broad limitations: (1) The explicit information mining for significant guiding semantics concealed in both vision and language is still under-explored; (2) The previously structured map method provides the average historical appearance of visited nodes, while it ignores distinctive contributions of various images and potent information retention in the reasoning process. This work proposes a dual semantic-aware recurrent global-adaptive network (DSRG) to address the above problems. First, DSRG proposes an instruction-guidance linguistic module (IGL) and an appearance-semantics visual module (ASV) for boosting vision and language semantic learning respectively. For the memory mechanism, a global adaptive aggregation module (GAA) is devised for explicit panoramic observation fusion, and a recurrent memory fusion module (RMF) is introduced to supply implicit temporal hidden states. Extensive experimental results on the R2R and REVERIE datasets demonstrate that our method achieves better performance than existing methods. Code is available at https://github.com/CrystalSixone/DSRG.
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Structural and model-based approaches, knowledge representation and reasoning
List of keywords 
Computer Vision -> CV: Vision and language  Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Structural and model-based approaches, knowledge representation and reasoning
1816
Cardinality-Minimal Explanations for Monotonic Neural Networks
[+] More 
[-] Less 
In recent years, there has been increasing interest in explanation methods for neural model predictions that offer precise formal guarantees. These include abductive (respectively, contrastive) methods, which aim to compute  minimal subsets of input features that are sufficient for a given prediction to hold (respectively, to change a given prediction). The corresponding decision problems are, however, known to be intractable. In this paper, we investigate whether  tractability can be regained by focusing on neural models implementing a monotonic function. Although the relevant decision problems remain intractable, we can show that they become solvable in polynomial time by means of greedy algorithms if we additionally assume that the activation functions are continuous everywhere and differentiable almost everywhere. Our experiments suggest favourable performance of our algorithms.
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Theory of deep learning
List of keywords 
Machine Learning -> ML: Explainable/Interpretable machine learning AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Theory of deep learning
1817
Explainable Multi-Agent Reinforcement Learning for Temporal Queries
[+] More 
[-] Less 
As multi-agent reinforcement learning (MARL) systems are increasingly deployed throughout society, it is imperative yet challenging for users to understand the emergent behaviors of MARL agents in complex environments. This work presents an approach for generating policy-level contrastive explanations for MARL to answer a temporal user query, which specifies a sequence of tasks completed by agents with possible cooperation. The proposed approach encodes the temporal query as a PCTL* logic formula and checks if the query is feasible under a given MARL policy via probabilistic model checking. Such explanations can help reconcile discrepancies between the actual and anticipated multi-agent behaviors. The proposed approach also generates correct and complete explanations to pinpoint reasons that make a user query infeasible. We have successfully applied the proposed approach to four benchmark MARL domains (up to 9 agents in one domain). Moreover, the results of a user study show that the generated explanations significantly improve user performance and satisfaction.
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Human-agent interaction 1826
Levin Tree Search with Context Models
[+] More 
[-] Less 
Levin Tree Search (LTS) is a search algorithm that makes use of a policy (a probability distribution over actions) 
and comes with a theoretical guarantee on the number of expansions before reaching a goal node, depending on the quality of the policy. 
This guarantee can be used as a loss function, which we call the LTS loss, to optimize neural networks representing the policy (LTS+NN). 
In this work we show that the neural network can be substituted with parameterized context models originating from the online compression literature (LTS+CM). 
We show that the LTS loss is convex under this new model,
which allows for using standard convex optimization tools,
and obtain convergence guarantees to the optimal parameters in an online setting for a given set of solution trajectories — guarantees that cannot be provided for neural networks. 
The new LTS+CM algorithm compares favorably against LTS+NN on several benchmarks: Sokoban (Boxoban), The Witness, and the 24-Sliding Tile puzzle (STP). The difference is particularly large on STP, where LTS+NN fails to solve most of the test instances while LTS+CM solves each test instance in a fraction of a second.
Furthermore, we show that LTS+CM is able to learn a policy that solves the Rubik’s cube in only a few hundred expansions, which considerably improves upon previous machine learning techniques.
Search -> S: Heuristic search
List of keywords 
Search -> S: Search and machine learning Search -> S: Heuristic search
1829
Multi-Task Learning via Time-Aware Neural ODE
[+] More 
[-] Less 
Multi-Task Learning (MTL) is a well-established paradigm for learning shared models for a diverse set of tasks. Moreover, MTL improves data efficiency by jointly training all tasks simultaneously. However, directly optimizing the losses of all the tasks may lead to imbalanced performance on all the tasks due to the competition among tasks for the shared parameters in MTL models. Many MTL methods try to mitigate this problem by dynamically weighting task losses or manipulating task gradients. Different from existing studies, in this paper, we propose a Neural Ordinal diffeRential equation based Multi-tAsk Learning (NORMAL) method to alleviate this issue by modeling task-specific feature transformations from the perspective of dynamic flows built on the Neural Ordinary Differential Equation (NODE). Specifically, the proposed NORMAL model designs a time-aware neural ODE block to learn task-specific time information, which determines task positions of feature transformations in the dynamic flow, in NODE automatically via gradient descent methods. In this way, the proposed NORMAL model handles the problem of competing shared parameters by learning task positions. Moreover, the learned task positions can be used to measure the relevance among different tasks. Extensive experiments show that the proposed NORMAL model outperforms state-of-the-art MTL models.
List of keywords 
Machine Learning -> ML: Multi-task and transfer learning 1830
New Bounds and Constraint Programming Models for the Weighted Vertex Coloring Problem
[+] More 
[-] Less 
This paper addresses the weighted vertex coloring problem (WVCP) which is an NP-hard variant of the graph coloring problem with various applications.
Given a vertex-weighted graph, the problem consists of partitioning vertices in independent sets (colors) so as to minimize the sum of the maximum weights of the colors.
We first present an iterative procedure to reduce the size of WVCP instances and prove new upper bounds on the objective value and the number of colors.
Alternative constraint programming models are then introduced which rely on primal and dual encodings of the problem and use symmetry breaking constraints.
A large number of experiments are conducted on benchmark instances.
We analyze the impact of using specific bounds to reduce the search space and speed up the exact resolution of instances.
New optimality proofs are reported for some benchmark instances.
Search -> S: Combinatorial search and optimisation
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint programming Search -> S: Combinatorial search and optimisation
1831
Norm Deviation in Multiagent Systems: A Foundation for Responsible Autonomy
[+] More 
[-] Less 
The power of norms in both human societies and sociotechnical systems arises from the facts that (1) societal norms, including laws and policies, characterize acceptable behavior in high-level terms and (2) they are not hard controls and can be deviated from. Thus, the design of responsibly autonomous agents faces an essential tension: these agents must both (1) respect applicable norms and (2) deviate from those norms when blindly following them may lead to diminished outcomes.
We propose a conceptual foundation for norm deviation. As a guiding framework, we adopt Habermas’s theory of communicative action comprising objective, subjective, and practical validity claims regarding the suitability of deviation. 
Our analysis thus goes beyond previous studies of norm deviation and yields reasoning guidelines uniting norms and values by which to develop responsible agents.
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Normative systems 1833
Probabilistic Masked Attention Networks  for Explainable Sequential Recommendation
[+] More 
[-] Less 
Transformer-based models are powerful for modeling temporal dynamics of user preference in sequential recommendation. Most of the variants adopt the Softmax transformation in the self-attention layers to generate dense attention probabilities. However, real-world item sequences are often noisy, containing a mixture of true-positive and false-positive interactions. Such dense attentions inevitably assign probability mass to noisy or irrelevant items, leading to sub-optimal performance and poor explainability. Here we propose a Probabilistic Masked Attention Network (PMAN) to identify the sparse pattern of attentions, which is more desirable for pruning noisy items in sequential recommendation. Specifically, we employ a probabilistic mask to achieve sparse attentions under a constrained optimization framework. As such, PMAN allows to select which information is critical to be retained or dropped in a data-driven fashion. Experimental studies on real-world benchmark datasets show that PMAN is able to improve the performance of Transformers significantly.
Data Mining -> DM: Information retrieval
Data Mining -> DM: Recommender systems
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
List of keywords 
Data Mining -> DM: Collaborative filtering Data Mining -> DM: Information retrieval
Data Mining -> DM: Recommender systems
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
1834
A Unifying Formal Approach to Importance Values in Boolean Functions
[+] More 
[-] Less 
Boolean functions and their representation through logics, circuits, machine learning classifiers, or binary decision diagrams (BDDs) play a central role in the design and analysis of computing systems. Quantifying the relative impact of variables on the truth value by means of importance values can provide useful insights to steer system design and debugging. In this paper, we introduce a uniform framework for reasoning about such values, relying on a generic notion of importance value functions (IVFs). The class of IVFs is defined by axioms motivated from several notions of importance values introduced in the literature, including Ben-Or and Linial’s influence and Chockler, Halpern, and Kupferman’s notion of responsibility and blame. We establish a connection between IVFs and game-theoretic concepts such as Shapley and Banzhaf values, both of which measure the impact of players on outcomes in cooperative games. Exploiting BDD-based symbolic methods and projected model counting, we devise and evaluate practical computation schemes for IVFs.
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Cooperative games AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
1836
Learning Constraint Networks over Unknown Constraint Languages
[+] More 
[-] Less 
Constraint acquisition is the task of learning a constraint network from examples of solutions and non-solutions. Existing constraint acquisition systems typically require advance knowledge of the target network’s constraint language, which significantly narrows their scope of applicability. In this paper we propose a constraint acquisition method that computes a suitable constraint language as part of the learning process, eliminating the need for any advance knowledge. We report preliminary experiments on various acquisition benchmarks.
Constraint Satisfaction and Optimization -> CSO: Constraint programming
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint learning and acquisition Constraint Satisfaction and Optimization -> CSO: Constraint programming
1838
On Translations between ML Models for XAI Purposes
[+] More 
[-] Less 
In this paper, the succinctness of various ML models is studied. To be more precise, the existence of polynomial-time and polynomial-space translations between representation languages for classifiers is investigated. The languages that are considered include decision trees, random forests, several types of boosted trees, binary neural networks, Boolean multilayer perceptrons, and various logical representations of  binary classifiers. We provide a complete map indicating for every pair of languages C, C’ whether or not a polynomial-time / polynomial-space translation exists from C to C’. We also explain how to take advantage of the resulting map for XAI purposes.
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
List of keywords 
Knowledge Representation and Reasoning -> KRR: Knowledge compilation Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
1842
Game Theory with Simulation of Other Players
[+] More 
[-] Less 
Game-theoretic interactions with AI agents could differ from traditional human-human interactions in various ways. One such difference is that it may be possible to simulate an AI agent (for example because its source code is known), which allows others to accurately predict the agent’s actions. This could lower the bar for trust and cooperation. In this paper, we first formally define games in which one player can simulate another at a cost, and derive some basic properties of such games. Then, we prove a number of results for such games, including: (1) introducing simulation into generic-payoff normal-form games makes them easier to solve; (2) if the only obstacle to cooperation is a lack of trust in the possibly-simulated agent, simulation enables equilibria that improve the outcome for both agents; and (3) however, there are settings where introducing simulation results in strictly worse outcomes for both players.
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Noncooperative games 1846
Gapformer: Graph Transformer with Graph Pooling for Node Classification
[+] More 
[-] Less 
Graph Transformers (GTs) have proved their advantage in graph-level tasks. However, existing GTs still perform unsatisfactorily on the node classification task due to 1) the overwhelming unrelated information obtained from a vast number of irrelevant distant nodes and 2) the quadratic complexity regarding the number of nodes via the fully connected attention mechanism. In this paper, we present Gapformer, a method for node classification that deeply incorporates Graph Transformer with Graph Pooling. More specifically, Gapformer coarsens the large-scale nodes of a graph into a smaller number of pooling nodes via local or global graph pooling methods, and then computes the attention solely with the pooling nodes rather than all other nodes. In such a manner, the negative influence of the overwhelming unrelated nodes is mitigated while maintaining the long-range information, and the quadratic complexity is reduced to linear complexity with respect to the fixed number of pooling nodes. Extensive experiments on 13 node classification datasets, including homophilic and heterophilic graph datasets, demonstrate the competitive performance of Gapformer over existing Graph Neural Networks and GTs.
Data Mining -> DM: Networks
List of keywords 
Data Mining -> DM: Mining graphs Data Mining -> DM: Networks
1850
MultiPar-T: Multiparty-Transformer for Capturing Contingent Behaviors in Group Conversations
[+] More 
[-] Less 
As we move closer to real-world social AI systems, AI agents must be able to deal with multiparty (group) conversations. Recognizing and interpreting multiparty behaviors is challenging, as the system must recognize individual behavioral cues, deal with the complexity of multiple streams of data from multiple people, and recognize the subtle contingent social exchanges that take place amongst group members. To tackle this challenge, we propose the Multiparty-Transformer (Multipar- T), a transformer model for multiparty behavior modeling. The core component of our proposed approach is Crossperson Attention, which is specifically designed to detect contingent behavior between pairs of people. We verify the effectiveness of Multipar-T on a publicly available video-based group engagement detection benchmark, where it outperforms state-of-the-art approaches in average F-1 scores by 5.2% and individual class F-1 scores by up to 10.0%. Through qualitative analysis, we show that our Crossperson Attention module is able to discover contingent behaviors.
Computer Vision -> CV: Video analysis and understanding
Humans and AI -> HAI: Computer-aided education
List of keywords 
Machine Learning -> ML: Attention models Computer Vision -> CV: Video analysis and understanding
Humans and AI -> HAI: Computer-aided education
1856
Towards a Better Understanding of Learning with Multiagent Teams
[+] More 
[-] Less 
While it has long been recognized that a team of individual learning agents can be greater than the sum of its parts, recent work has shown that larger teams are not necessarily more effective than smaller ones. In this paper, we study why and under which conditions certain team structures promote effective learning for a population of individual learning agents. We show that, depending on the environment, some team structures help agents learn to specialize into specific roles, resulting in more favorable global results. However, large teams create credit assignment challenges that reduce coordination, leading to large teams performing poorly compared to smaller ones. We support our conclusions with both theoretical analysis and empirical results.
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
1868
Anticipatory Fictitious Play
[+] More 
[-] Less 
Fictitious play is an algorithm for computing Nash equilibria of matrix games. Recently, machine learning variants of fictitious play have been successfully applied to complicated real-world games. This paper presents a simple modification of fictitious play which is a strict improvement over the original: it has the same theoretical worst-case convergence rate, is equally applicable in a machine learning context, and enjoys superior empirical performance. We conduct an extensive comparison of our algorithm with fictitious play, proving an optimal O(1/t) convergence rate for certain classes of games, demonstrating superior performance numerically across a variety of games, and concluding with experiments that extend these algorithms to the setting of deep multiagent reinforcement learning.
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Machine Learning -> ML: Reinforcement learning
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Machine Learning -> ML: Reinforcement learning
1875
Advancing Post-Hoc Case-Based Explanation with Feature Highlighting
[+] More 
[-] Less 
Explainable AI (XAI) has been proposed as a valuable tool to assist in downstream tasks involving human-AI collaboration. Perhaps the most psychologically valid XAI techniques are case-based approaches which display "whole" exemplars to explain the predictions of black-box AI systems. However, for such post-hoc XAI methods dealing with images, there has been no attempt to improve their scope by using multiple clear feature "parts" of the images to explain the predictions while linking back to relevant cases in the training data, thus allowing for more comprehensive explanations that are faithful to the underlying model. Here, we address this gap by proposing two general algorithms (latent and superpixel-based) which can isolate multiple clear feature parts in a test image, and then connect them to the explanatory cases found in the training data, before testing their effectiveness in a carefully designed user study. Results demonstrate that the proposed approach appropriately calibrates a user’s feelings of "correctness" for ambiguous classifications in real world data on the ImageNet dataset, an effect which does not happen when just showing the explanation without feature highlighting.
Knowledge Representation and Reasoning -> KRR: Case-based reasoning
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability Knowledge Representation and Reasoning -> KRR: Case-based reasoning
1883
On the Paradox of Learning to Reason from Data
[+] More 
[-] Less 
Logical reasoning is needed in a wide range of NLP tasks. Can a BERT model be trained end-to-end to solve logical reasoning problems presented in natural language? We attempt to answer this question in a confined problem space where there exists a set of parameters that perfectly simulates logical reasoning. We make observations that seem to contradict each other: BERT attains near-perfect accuracy on in-distribution test examples while failing to generalize to other data distributions over the exact same problem space. Our study provides an explanation for this paradox: instead of learning to emulate the correct reasoning function, BERT has, in fact, learned statistical features that inherently exist in logical reasoning problems. We also show that it is infeasible to jointly remove statistical features from data, illustrating the difficulty of learning to reason in general. Our result naturally extends to other neural models (e.g. T5) and unveils the fundamental difference between learning to reason and learning to achieve high performance on NLP benchmarks using statistical features.
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
List of keywords 
Knowledge Representation and Reasoning -> KRR: Learning and reasoning Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
1884
Improve Video Representation with Temporal Adversarial Augmentation
[+] More 
[-] Less 
Recent works reveal that adversarial augmentation benefits the generalization of neural networks (NNs) if used in an appropriate manner. In this paper, we introduce Temporal Adversarial Augmentation (TA), a novel video augmentation technique that utilizes temporal attention. Unlike conventional adversarial augmentation, TA is specifically designed to shift the attention distributions of neural networks with respect to video clips by maximizing a temporal-related loss function. We demonstrate that TA will obtain diverse temporal views, which significantly affect the focus of neural networks. Training with these examples remedies the flaw of unbalanced temporal information perception and enhances the ability to defend against temporal shifts, ultimately leading to better generalization. To leverage TA, we propose Temporal Video Adversarial Fine-tuning (TAF) framework for improving video representations. TAF is a model-agnostic, generic, and interpretability-friendly training strategy. We evaluate TAF with four powerful models (TSM, GST, TAM, and TPN) over three challenging temporal-related benchmarks (Something-something V1&V2 and diving48). Experimental results demonstrate that TAF effectively improves the test accuracy of these models with notable margins without introducing additional parameters or computational costs. As a byproduct, TAF also improves the robustness under out-of-distribution (OOD) settings. Code is available at https://github.com/jinhaoduan/TAF.
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Representation learning
List of keywords 
Computer Vision -> CV: Video analysis and understanding    Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
Computer Vision -> CV: Representation learning
1889
One Model, Any CSP: Graph Neural Networks as Fast Global Search Heuristics for Constraint Satisfaction
[+] More 
[-] Less 
We propose a universal Graph Neural Network architecture which can be trained as an end-2-end search heuristic for any Constraint Satisfaction Problem (CSP). Our architecture can be trained unsupervised with policy gradient descent to generate problem specific heuristics for any CSP in a purely data driven manner. 
The approach is based on a novel graph representation for CSPs that is both generic and compact and enables us to process every possible CSP instance with one GNN, regardless of constraint arity, relations or domain size. Unlike previous RL-based methods, we operate on a global search action space and allow our GNN to modify any number of variables in every step of the stochastic search. This enables our method to properly leverage the inherent parallelism of GNNs. 
We perform a thorough empirical evaluation where we learn heuristics for well known and important CSPs, both decision and optimisation problems, from random data, including graph coloring, MAXCUT, and MAX-k-SAT, and the general RB model. Our approach significantly outperforms prior end-2-end approaches for neural combinatorial optimization. It can compete with conventional heuristics and solvers on test instances that are several orders of magnitude larger and structurally more complex than those seen during training.
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Machine Learning -> ML: Sequence and graph learning
List of keywords 
Machine Learning -> ML: Reinforcement learning Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
Machine Learning -> ML: Sequence and graph learning
1918
Contrastive Learning for Sign Language Recognition and Translation
[+] More 
[-] Less 
There are two problems that widely exist in current end-to-end sign language processing architecture. One is the CTC spike phenomenon which weakens the visual representational ability in Continuous Sign Language Recognition (CSLR). The other one is the exposure bias problem which leads to the accumulation of translation errors during inference in  Sign Language Translation (SLT). In this paper, we tackle these issues by introducing contrast learning, aiming to enhance both visual-level feature representation and semantic-level error tolerance. Specifically, to alleviate CTC spike phenomenon and enhance visual-level representation, we design a visual contrastive loss by minimizing visual feature distance between different augmented samples of frames in one sign video, so that the model can further explore features by utilizing numerous unlabeled frames in an unsupervised way. To alleviate exposure bias problem and improve semantic-level error tolerance, we design a semantic contrastive loss by re-inputting the predicted sentence into semantic module and comparing features of ground-truth sequence  and predicted sequence, for exposing model to its own mistakes. Besides, we propose two new metrics, i.e., Blank Rate and Consecutive Wrong Word Rate to directly reflect our improvement on the two problems. Extensive experimental results on current sign language datasets demonstrate the effectiveness of our approach, which achieves state-of-the-art performance.
Computer Vision -> CV: Action and behavior recognition
List of keywords 
Computer Vision -> CV: Vision and language  Computer Vision -> CV: Action and behavior recognition
1920
Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue
[+] More 
[-] Less 
Collaborative tasks often begin with partial task knowledge and incomplete plans from each partner. 
To complete these tasks, partners need to engage in situated communication with their partners and coordinate their partial plans towards a complete plan to achieve a joint task goal. 
While such collaboration seems effortless in a human-human team, it is highly challenging for human-AI collaboration. 
To address this limitation, this paper takes a step towards Collaborative Plan Acquisition, where humans and agents strive to learn and communicate with each other to acquire a complete plan for joint tasks. 
Specifically, we formulate a novel problem for agents to predict the missing task knowledge for themselves and for their partners based on rich perceptual and dialogue history. 
We extend a situated dialogue benchmark for symmetric collaborative tasks in a 3D blocks world and investigate computational strategies for plan acquisition. 
Our empirical results suggest that predicting the partner’s missing knowledge is a more viable approach than predicting one’s own. 
We show that explicit modeling of the partner’s dialogue moves and mental states produces improved and more stable results than without.
These results provide insight for future AI agents that 
can predict what knowledge their partner is missing and, therefore, can proactively communicate such information to help the partner acquire such missing knowledge toward a common understanding of joint tasks.
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
List of keywords 
Humans and AI -> HAI: Human-AI collaboration Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
1927
CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization
[+] More 
[-] Less 
Ultra-fine-grained visual classification (ultra-FGVC) targets at classifying sub-grained categories of fine-grained objects. This inevitably requires discriminative representation learning within a limited training set. Exploring intrinsic features from the object itself, e.g., predicting the rotation of a given image, has demonstrated great progress towards learning discriminative representation. Yet none of these works consider explicit supervision for learning mutual information at instance level. To this end, this paper introduces CLE-ViT, a novel contrastive learning encoded transformer, to address the fundamental problem in ultra-FGVC. The core design is a self-supervised module that performs self-shuffling and masking and then distinguishes these altered images from other images. This drives the model to learn an optimized feature space that has a large inter-class distance while remaining tolerant to intra-class variations. By incorporating this self-supervised module, the network acquires more knowledge from the intrinsic structure of the input data, which improves the generalization ability without requiring extra manual annotations. CLE-ViT demonstrates strong performance on 7 publicly available datasets, demonstrating its effectiveness in the ultra-FGVC task. The code is available at https://github.com/Markin-Wang/CLEViT.
Machine Learning -> ML: Applications
List of keywords 
Machine Learning -> ML: Classification Machine Learning -> ML: Applications
1932
Incorporating Unlikely Negative Cues for Distinctive Image Captioning
[+] More 
[-] Less 
While recent neural image captioning models have shown great promise in terms of automatic metrics, they still struggle with generating generic sentences, which limits their use to only a handful of simple scenarios. On the other hand, negative training has been suggested as an effective way to prevent models from producing frequent yet meaningless sentences. However, when applied to image captioning, this approach may overlook low-frequency but generic and vague sentences, which can be problematic when dealing with diverse and changeable visual scenes. In this paper, we introduce a approach to improve image captioning by integrating negative knowledge that focuses on preventing the model from producing undesirable generic descriptions while addressing previous limitations. We accomplish this by training a negative teacher model that generates image-wise generic sentences with retrieval entropy-filtered data. Subsequently, the student model is required to maximize the distance with multi-level negative knowledge transferring for optimal guiding. Empirical results evaluated on MS COCO benchmark confirm that our plug-and-play framework incorporating unlikely negative knowledge leads to significant improvements in both accuracy and diversity, surpassing previous state-of-the-art methods for distinctive image captioning.
Machine Learning -> ML: Learning preferences or rankings
List of keywords 
Computer Vision -> CV: Vision and language  Machine Learning -> ML: Learning preferences or rankings
1938
RAIN: RegulArization on Input and Network for Black-Box Domain Adaptation
[+] More 
[-] Less 
Source-Free domain adaptation transits the source-trained model towards target domain without exposing the source data, trying to dispel these concerns about data privacy and security. However, this paradigm is still at risk of data leakage due to adversarial attacks on the source model. Hence, the Black-Box setting only allows to use the outputs of source model, but still suffers from overfitting on the source domain more severely due to source model’s unseen weights. In this paper, we propose a novel approach named RAIN (RegulArization on Input and Network) for Black-Box domain adaptation from both input-level and network-level regularization. For the input-level, we design a new data augmentation technique as Phase MixUp, which highlights task-relevant objects in the interpolations, thus enhancing input-level regularization and class consistency for target models. For network-level, we develop a Subnetwork Distillation mechanism to transfer knowledge from the target subnetwork to the full target network via knowledge distillation, which thus alleviates overfitting on the source domain by learning diverse target representations. Extensive experiments show that our method achieves state-of-the-art performance on several cross-domain benchmarks under both single- and multi-source black-box domain adaptation.
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
List of keywords 
Machine Learning -> ML: Multi-task and transfer learning Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
1953
Tracking Different Ant Species: An Unsupervised Domain Adaptation Framework and a Dataset for Multi-object Tracking
[+] More 
[-] Less 
Tracking individuals is a vital part of many experiments conducted to understand collective behaviour. Ants are the paradigmatic model system for such experiments but their lack of individually distinguishing visual features and their high colony densities make it extremely difficult to perform reliable racking automatically. Additionally, the wide diversity of their species’  appearances makes a generalized approach even harder. In this paper, we propose a data-driven multi-object tracker that, for the first time, employs domain adaptation to achieve the required generalisation. This approach is built upon a joint-detection-and-tracking framework that is extended by a set of domain discriminator modules integrating an adversarial training strategy in addition to the tracking loss. In addition to this novel domain-adaptive tracking framework, we present a new dataset and a benchmark for the ant tracking problem. The dataset contains 57 video sequences with full trajectory annotation, including 30k frames captured from two different ant species moving on different background patterns. It comprises 33 and 24 sequences for source and target domains, respectively. We compare our proposed framework against other domain-adaptive and non-domain-adaptive multi-object tracking baselines using this dataset and show that incorporating domain adaptation at multiple levels of the tracking pipeline yields significant improvements. The code and the dataset are available at https://github.com/chamathabeysinghe/da-tracker.
Computer Vision -> CV: Applications
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
List of keywords 
Computer Vision -> CV: Motion and tracking Computer Vision -> CV: Applications
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
1957
Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies
[+] More 
[-] Less 
This paper introduces Local Learner (2L), an algorithm for providing a set of reference strategies to guide the search for programmatic strategies in two-player zero-sum games. Previous learning algorithms, such as Iterated Best Response (IBR), Fictitious Play (FP), and Double-Oracle (DO), can be computationally expensive or miss important information for guiding search algorithms. 2L actively selects a set of reference strategies to improve the search signal. We empirically demonstrate the advantages of our approach while guiding a local search algorithm for synthesizing strategies in three games, including MicroRTS, a challenging real-time strategy game. Results show that 2L learns reference strategies that provide a stronger search signal than IBR, FP, and DO. We also simulate a tournament of MicroRTS, where a synthesizer using 2L outperformed the winners of the two latest MicroRTS competitions, which were programmatic strategies written by human programmers.
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Search -> S: Local search
Machine Learning -> ML: Symbolic methods
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Computer games Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Search -> S: Local search
Machine Learning -> ML: Symbolic methods
1966
Inferring Private Valuations from Behavioral Data in Bilateral Sequential Bargaining
[+] More 
[-] Less 
Inferring bargainers’ private valuations on items from their decisions is crucial for analyzing their strategic behaviors in bilateral sequential bargaining. Most existing approaches that infer agents’ private information from observable data either rely on strong equilibrium assumptions or require a careful design of agents’ behavior models. To overcome these weaknesses, we propose a Bayesian Learning-based Valuation Inference (BLUE) framework. Our key idea is to derive feasible intervals of bargainers’ private valuations from their behavior data, using the fact that most bargainers do not choose strictly dominated strategies. We leverage these feasible intervals to guide our inference. Specifically, we first model each bargainer’s behavior function (which maps his valuation and bargaining history to decisions) via a recurrent neural network. Second, we learn these behavior functions by utilizing a novel loss function defined based on feasible intervals. Third, we derive the posterior distributions of bargainers’ valuations according to their behavior data and learned behavior functions. Moreover, we account for the heterogeneity of bargainer behaviors, and propose a clustering algorithm (K-Loss) to improve the efficiency of learning these behaviors. Experiments on both synthetic and real bargaining data show that our inference approach outperforms baselines.
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Game Theory and Economic Paradigms -> GTEP: Other
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Game Theory and Economic Paradigms -> GTEP: Other
1983
XFormer: Fast and Accurate Monocular 3D Body Capture
[+] More 
[-] Less 
We present XFormer, a novel human mesh and motion capture method that achieves real-time performance on consumer CPUs given only monocular images as input. The proposed network architecture contains two branches: a keypoint branch that estimates 3D human mesh vertices given 2D keypoints, and an image branch that makes prediction directly from the RGB image features. At the core of our method is a cross-modal transformer block that allows information flow across these two branches by modeling the attention between 2D keypoint coordinates and image spatial features. Our architecture is smartly designed, which enables us to train on various types of datasets including images with 2D/3D annotations, images with 3D pseudo labels, and motion capture datasets that do not have associated images. This effectively improves the accuracy and generalization ability of our system. Built on a lightweight backbone (MobileNetV3), our method runs blazing fast (over 30fps on a single CPU core) and still yields competitive accuracy. Furthermore, with a HRNet backbone, XFormer delivers state-of-the-art performance on Huamn3.6 and 3DPW datasets.
List of keywords 
Computer Vision -> CV: 3D computer vision 2002
A Novel Learnable Interpolation Approach for Scale-Arbitrary Image Super-Resolution
[+] More 
[-] Less 
Deep convolutional neural networks (CNNs) have achieved unprecedented success in single image super-resolution over the past few years. Meanwhile, there is an increasing demand for single image super-resolution with arbitrary scale factors in real-world scenarios. Many approaches adopt scale-specific multi-path learning to cope with multi-scale super-resolution with a single network. However, these methods require a large number of parameters. To achieve a better balance between the reconstruction quality and parameter amounts, we proposes a learnable interpolation method that leverages the advantages of neural networks and interpolation methods to tackle the scale-arbitrary super-resolution task. The scale factor is treated as a function parameter for generating the kernel weights for the learnable interpolation. We demonstrate that the learnable interpolation builds a bridge between neural networks and traditional interpolation methods. Experiments show that the proposed learnable interpolation requires much fewer parameters and outperforms state-of-the-art super-resolution methods.
Machine Learning -> ML: Convolutional networks
List of keywords 
Computer Vision -> CV: Other Machine Learning -> ML: Convolutional networks
2005
Linear Query Approximation Algorithms for Non-monotone Submodular Maximization under Knapsack Constraint
[+] More 
[-] Less 
This work, for the first time, introduces two constant factor approximation algorithms with linear query complexity for non-monotone submodular maximization over a ground set of size n subject to a knapsack constraint, DLA  and RLA. DLA is a deterministic algorithm that provides an approximation factor of nearly 6 while  RLA  is a randomized algorithm with an approximation factor of nearly 4. Both run in linear query complexity. The key idea to obtain a constant approximation ratio with linear query lies in: (1) dividing the ground set into two appropriate subsets to find the near-optimal solution over these subsets with linear queries, and (2) combining a threshold greedy with properties of two disjoint sets or a random selection process to improve solution quality. In addition to the theoretical analysis, we have evaluated our proposed solutions with three applications: Revenue Maximization, Image Summarization, and Maximum Weighted Cut, showing that our algorithms not only return comparative results to state-of-the-art algorithms but also require significantly fewer queries.
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
List of keywords 
Machine Learning -> ML: Optimization Constraint Satisfaction and Optimization -> CSO: Constraint optimization
2012
BARA: Efficient Incentive Mechanism with Online Reward Budget Allocation in Cross-Silo Federated Learning
[+] More 
[-] Less 
Federated learning (FL) is a prospective distributed machine learning  framework that can preserve data privacy. In particular, cross-silo FL can complete model training by making isolated data islands of different organizations collaborate with a parameter server (PS) via exchanging model parameters for multiple communication rounds. In cross-silo FL, an incentive mechanism is indispensable  for motivating data owners to contribute their models to FL training. However, how to allocate the reward budget among different rounds is an essential but complicated problem  largely overlooked by existing works. The challenge of this problem lies in the opaque feedback between reward budget allocation and model utility improvement of FL, making the optimal reward budget allocation complicated. To address this problem, we design an online reward budget allocation algorithm using Bayesian optimization named BARA (Budget Allocation for Reverse Auction). Specifically, BARA can model the complicated relationship between reward budget allocation and final model accuracy in FL based on historical training records so that the reward budget allocated to each communication round is dynamically optimized so as to maximize the final model utility. We further incorporate the BARA algorithm into reverse auction-based incentive mechanisms to illustrate its effectiveness. Extensive experiments are conducted on real datasets to demonstrate that BARA significantly outperforms competitive baselines by improving  model utility with the same amount of reward budget.
List of keywords 
Machine Learning -> ML: Federated learning 2015
Reinforcement Learning Approaches for Traffic Signal Control under Missing Data
[+] More 
[-] Less 
The emergence of reinforcement learning (RL) methods in traffic signal control (TSC) tasks has achieved promising results. Most RL approaches require the observation of the environment for the agent to decide which action is optimal for a long-term reward. However, in real-world urban scenarios, missing observation of traffic states may frequently occur due to the lack of sensors, which makes existing RL methods inapplicable on road networks with missing observation. In this work, we aim to control the traffic signals in a real-world setting, where some of the intersections in the road network are not installed with sensors and thus with no direct observations around them. To the best of our knowledge, we are the first to use RL methods to tackle the TSC problem in this real-world setting. Specifically, we propose two solutions: 1) imputes the traffic states to enable adaptive control. 2) imputes both states and rewards to enable adaptive control and the training of RL agents. Through extensive experiments on both synthetic and real-world road network traffic, we reveal that our method outperforms conventional approaches and performs consistently with different missing rates. We also investigate how missing data influences the performance of our model.
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Reinforcement learning
List of keywords 
Data Mining -> DM: Mining spatial and/or temporal data Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Reinforcement learning
2038
LION: Label Disambiguation for Semi-supervised Facial Expression Recognition with Progressive Negative Learning
[+] More 
[-] Less 
Semi-supervised deep facial expression recognition (SS-DFER) has recently attracted rising research interest due to its more practical setting of abundant unlabeled data. However, there are two main problems unconsidered in current SS-DFER methods: 1) label ambiguity, i.e., given labels mismatch with facial expressions; 2) inefficient utilization of unlabeled data with low-confidence. In this paper, we propose a novel SS-DFER method, including a Label DIsambiguation module and a PrOgressive Negative Learning module, namely LION, to simultaneously address both problems. Specifically, the label disambiguation module operates on labeled data, including data with accurate labels (clear data) and ambiguous labels (ambiguous data). It first uses clear data to calculate prototypes for all the expression classes, and then re-assign a candidate label set to all the ambiguous data. Based on the prototypes and the candidate label set, the ambiguous data can be relabeled more accurately. As for unlabeled data with low-confidence, the progressive negative learning module is developed to iteratively mine more complete complementary labels, which can guide the model to reduce the association between data and corresponding complementary labels. Experiments on three challenging datasets show that our method significantly outperforms the current state-of-the-art approaches in SS-DFER and surpasses fully-supervised baselines. Code will be available at https://github.com/NUM-7/LION.
Computer Vision -> CV: Applications
List of keywords 
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning    Computer Vision -> CV: Applications
2045
FedDWA: Personalized Federated Learning with Dynamic Weight Adjustment
[+] More 
[-] Less 
Different from conventional federated learning, personalized federated learning (PFL) is able to train a customized model for each individual client according to its unique requirement. The mainstream approach is to adopt a kind of weighted aggregation method to generate personalized models, in which weights are determined by the loss value or model parameters among different clients. However, such kinds of methods require clients to download others’ models. It not only sheer increases communication traffic but also potentially infringes data privacy. In this paper, we propose a new PFL algorithm called FedDWA (Federated Learning with Dynamic Weight Adjustment) to address the above problem, which leverages the parameter server (PS) to compute personalized aggregation weights based on collected models from clients. In this way, FedDWA can capture similarities between clients with much less communication overhead. More specifically, we formulate the PFL problem as an optimization problem by minimizing the distance between personalized models and  guidance models, so as to  customize aggregation weights for each client. Guidance models are obtained by  the local one-step ahead adaptation on individual clients. Finally,  we conduct extensive experiments using five real datasets and the results demonstrate that FedDWA can significantly reduce the communication traffic and achieve much higher model accuracy than the state-of-the-art approaches.
List of keywords 
Machine Learning -> ML: Federated learning 2048
Video Diffusion Models with Local-Global Context Guidance
[+] More 
[-] Less 
Diffusion models have emerged as a powerful paradigm in video synthesis tasks including prediction, generation, and interpolation. Due to the limitation of the computational budget, existing methods usually implement conditional diffusion models with an autoregressive inference pipeline, in which the future fragment is predicted based on the distribution of adjacent past frames. However, only the conditions from a few previous frames can’t capture the global temporal coherence, leading to inconsistent or even outrageous results in long-term video prediction. In this paper, we propose a Local-Global Context guided Video Diffusion model (LGC-VD) to capture multi-perception conditions for producing high-quality videos in both conditional/unconditional settings. In LGC-VD, the UNet is implemented with stacked residual blocks with self-attention units, avoiding the undesirable computational cost in 3D Conv. We construct a local-global context guidance strategy to capture the multi-perceptual embedding of the past fragment to boost the consistency of future prediction. Furthermore, we propose a two-stage training strategy to alleviate the effect of noisy frames for more stable predictions. Our experiments demonstrate that the proposed method achieves favorable performance on video prediction, interpolation, and unconditional video generation. We release code at https://github.com/exisas/LGC-VD.
Computer Vision -> CV: Video analysis and understanding
List of keywords 
Computer Vision -> CV: Neural generative models, auto encoders, GANs   Computer Vision -> CV: Video analysis and understanding
2051
Continuous-Time Graph Learning for Cascade Popularity Prediction
[+] More 
[-] Less 
Information propagation on social networks could be modeled as cascades, and many efforts have been made to predict the future popularity of cascades. However, most of the existing research treats a cascade as an individual sequence. Actually, the cascades might be correlated with each other due to the shared users or similar topics. Moreover, the preferences of users and semantics of a cascade are usually continuously evolving over time. In this paper, we propose a continuous-time graph learning method for cascade popularity prediction, which first connects different cascades via a universal sequence of user-cascade and user-user interactions and then chronologically learns on the sequence by maintaining the dynamic states of users and cascades. Specifically, for each interaction, we present an evolution learning module to continuously update the dynamic states of the related users and cascade based on their currently encoded messages and previous dynamic states. We also devise a cascade representation learning component to embed the temporal information and structural information carried by the cascade. Experiments on real-world datasets demonstrate the superiority and rationality of our approach.
Data Mining -> DM: Mining spatial and/or temporal data
List of keywords 
Data Mining -> DM: Mining text, web, social media Data Mining -> DM: Mining spatial and/or temporal data
2072
Learnable Surrogate Gradient for Direct Training Spiking Neural Networks
[+] More 
[-] Less 
Spiking neural networks (SNNs) have increasingly drawn massive research attention due to biological interpretability and efficient computation. Recent achievements are devoted to utilizing the surrogate gradient (SG) method to avoid the dilemma of non-differentiability of spiking activity to directly train SNNs by backpropagation. However, the fixed width of the SG leads to gradient vanishing and mismatch problems, thus limiting the performance of directly trained SNNs. In this work, we propose a novel perspective to unlock the width limitation of SG, called the learnable surrogate gradient (LSG) method. The LSG method modulates the width of SG according to the change of the distribution of the membrane potentials, which is identified to be related to the decay factors based on our theoretical analysis. Then we introduce the trainable decay factors to implement the LSG method, which can optimize the width of SG automatically during training to avoid the gradient vanishing and mismatch problems caused by the limited width of SG. We evaluate the proposed LSG method on both image and neuromorphic datasets. Experimental results show that the LSG method can effectively alleviate the blocking of gradient propagation caused by the limited width of SG when training deep SNNs directly. Meanwhile, the LSG method can help SNNs achieve competitive performance on both latency and accuracy.
Humans and AI -> HAI: Brain sciences
Humans and AI -> HAI: Cognitive systems
List of keywords 
Humans and AI -> HAI: Cognitive modeling Humans and AI -> HAI: Brain sciences
Humans and AI -> HAI: Cognitive systems
2077
Fluid Dynamics-Inspired Network for Infrared Small Target Detection
[+] More 
[-] Less 
Most infrared small target detection (ISTD) networks focus on building effective neural blocks or feature fusion modules but none describes the ISTD process from the image evolution perspective. The directional evolution of image pixels influenced by convolution, pooling and surrounding pixels is analogous to the movement of fluid elements constrained by surrounding variables ang particles. Inspired by this, we explore a novel research routine by abstracting the movement of pixels in the ISTD process as the flow of fluid in fluid dynamics (FD). Specifically, a new Fluid Dynamics-Inspired Network (FDI-Net) is devised for ISTD. Based on Taylor Central Difference (TCD) method, the TCD feature extraction block is designed, where convolution and Transformer structures are combined for local and global information. The pixel motion equation during the ISTD process is derived from the Navier–Stokes (N-S) equation, constructing a N-S Refinement Module that refines extracted features with edge details. Thus, the TCD feature extraction block determines the primary movement direction of pixels during detection, while the N-S Refinement Module corrects some skewed directions of the pixel stream to supplement the edge details. Experiments on IRSTD-1k and SIRST demonstrate that our method achieves SOTA performance in terms of evaluation metrics.
Computer Vision -> CV: Recognition (object detection, categorization)
List of keywords 
Computer Vision -> CV: Segmentation Computer Vision -> CV: Recognition (object detection, categorization)
2094
Video Frame Interpolation with Densely Queried Bilateral Correlation
[+] More 
[-] Less 
Video Frame Interpolation (VFI) aims to synthesize non-existent intermediate frames between existent frames. Flow-based VFI algorithms estimate intermediate motion fields to warp the existent frames. Real-world motions’ complexity and the reference frame’s absence make motion estimation challenging. Many state-of-the-art approaches explicitly model the correlations between two neighboring frames for more accurate motion estimation. In common approaches, the receptive field of correlation modeling at higher resolution depends on the motion fields estimated beforehand. Such receptive field dependency makes common motion estimation approaches poor at coping with small and fast-moving objects. To better model correlations and to produce more accurate motion fields, we propose the Densely Queried Bilateral Correlation (DQBC) that gets rid of the receptive field dependency problem and thus is more friendly to small and fast-moving objects. The motion fields generated with the help of DQBC are further refined and up-sampled with context features. After the motion fields are fixed, a CNN-based SynthNet synthesizes the final interpolated frame. Experiments show that our approach enjoys higher accuracy and less inference time than the state-of-the-art. Source code is available at https://github.com/kinoud/DQBC.
Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Video analysis and understanding
List of keywords 
Computer Vision -> CV: Computational photography Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Video analysis and understanding
2099
Robust Reinforcement Learning via Progressive Task Sequence
[+] More 
[-] Less 
Robust reinforcement learning (RL) has been a challenging problem due to the gap between simulation and the real world. Existing efforts typically address the robust RL problem by solving a max-min problem. The main idea is to maximize the cumulative reward under the worst-possible perturbations. However, the worst-case optimization either leads to overly conservative solutions or unstable training process, which further affects the policy robustness and generalization performance. In this paper, we tackle this problem from both formulation definition and algorithm design. First, we formulate the robust RL as a max-expectation optimization problem, where the goal is to find an optimal policy under both the worst cases and the non-worst cases. Then, we propose a novel framework DRRL to solve the max-expectation optimization. Given our definition of the feasible tasks, a task generation and sequencing mechanism is introduced to dynamically output tasks at appropriate difficulty level for the current policy. With these progressive tasks, DRRL realizes dynamic multi-task learning to improve the policy robustness and the training stability. Finally, extensive experiments demonstrate that the proposed method exhibits significant performance on the unmanned CarRacing game and multiple high-dimensional MuJoCo environments.
Agent-based and Multi-agent Systems -> MAS: Agent theories and models
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Safety and robustness Agent-based and Multi-agent Systems -> MAS: Agent theories and models
2103
Explainable Reinforcement Learning via a Causal World Model
[+] More 
[-] Less 
Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.
Machine Learning -> ML: Causality
Machine Learning -> ML: Reinforcement learning
List of keywords 
Machine Learning -> ML: Explainable/Interpretable machine learning Machine Learning -> ML: Causality
Machine Learning -> ML: Reinforcement learning
2106
Handling Learnwares Developed from Heterogeneous Feature Spaces without Auxiliary Data
[+] More 
[-] Less 
The learnware paradigm proposed by Zhou [2016] devotes to constructing a market of numerous well-performed models, enabling users to solve problems by reusing existing efforts rather than starting from scratch. A learnware comprises a trained model and the specification which enables the model to be adequately identified according to the user’s requirement. Previous studies concentrated on the homogeneous case where models share the same feature space based on Reduced Kernel Mean Embedding (RKME) specification. However, in real-world scenarios, models are typically constructed from different feature spaces. If such a scenario can be handled by the market, all models built for a particular task even with different feature spaces can be identified and reused for a new user task. Generally, this problem would be easier if there were additional auxiliary data connecting different feature spaces, however, obtaining such data in reality is challenging. In this paper, we present a general framework for accommodating heterogeneous learnwares without requiring additional auxiliary data. The key idea is to utilize the submitted RKME specifications to establish the relationship between different feature spaces. Additionally, we give a matrix factorization-based implementation and propose the overall procedure for constructing and exploiting the heterogeneous learnware market. Experiments on real-world tasks validate the efficacy of our method.
Machine Learning -> ML: Automated machine learning
Machine Learning -> ML: Multi-task and transfer learning
List of keywords 
Machine Learning -> ML: Classification Machine Learning -> ML: Automated machine learning
Machine Learning -> ML: Multi-task and transfer learning
2118
Hawkes Process Based on Controlled Differential Equations
[+] More 
[-] Less 
Hawkes processes are a popular framework to model the occurrence of sequential events, i.e., occurrence dynamics, in several fields such as social diffusion. In real-world scenarios, the inter-arrival time among events is irregular. However, existing neural network-based Hawkes process models not only i) fail to capture such complicated irregular dynamics, but also ii) resort to heuristics to calculate the log-likelihood of events since they are mostly based on neural networks designed for regular discrete inputs. To this end, we present the concept of Hawkes process based on controlled differential equations (HP-CDE), by adopting the neural controlled differential equation (neural CDE) technology which is an analogue to continuous RNNs. Since HP-CDE continuously reads data, i) irregular time-series datasets can be properly treated preserving their uneven temporal spaces, and ii) the log-likelihood can be exactly computed. Moreover, as both Hawkes processes and neural CDEs are first developed to model complicated human behavioral dynamics, neural CDE-based Hawkes processes are successful in modeling such occurrence dynamics. In our experiments with 4 real-world datasets, our method outperforms existing methods by non-trivial margins.
Data Mining -> DM: Mining text, web, social media
List of keywords 
Data Mining -> DM: Mining spatial and/or temporal data Data Mining -> DM: Mining text, web, social media
2120
Truthful Auctions for Automated Bidding in Online Advertising
[+] More 
[-] Less 
Automated bidding, an emerging intelligent decision-making paradigm powered by machine learning, has become popular in online advertising. Advertisers in automated bidding evaluate the cumulative utilities and have private financial constraints over multiple ad auctions in a long-term period. Based on these distinct features, we consider a new ad auction model for automated bidding: the values of advertisers are public while the financial constraints, such as budget and return on investment (ROI) rate, are private types. We derive the truthfulness conditions with respect to private constraints for this multi-dimensional setting, and demonstrate any feasible allocation rule could be equivalently reduced to a series of non-decreasing functions on budget. However, the resulted allocation mapped from these non-decreasing functions generally follows an irregular shape, making it difficult to obtain a closed-form expression for the auction objective. To overcome this design difficulty, we propose a family of truthful automated bidding auction with personalized rank scores, similar to the Generalized Second-Price (GSP) auction. The intuition behind our design is to leverage personalized rank scores as the criteria to allocate items, and compute a critical ROI to transforms the constraints on budget to the same dimension as ROI. The experimental results demonstrate that the proposed auction mechanism outperforms the widely used ad auctions, such as first-price auction and second-price auction, in various automated bidding environments.
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Mechanism design Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
2144
Stochastic Feature Averaging for Learning with Long-Tailed Noisy Labels
[+] More 
[-] Less 
Deep neural networks have shown promising results on a wide variety of tasks using large-scale and well-annotated training datasets. However, data collected from real-world applications can suffer from two prevalent biases, i.e., long-tailed class distribution and label noise. Previous efforts on long-tailed learning and label-noise learning can only address a single type of data bias, leading to a severe deterioration of their performance. In this paper, we propose a distance-based sample selection algorithm called Stochastic Feature Averaging (SFA), which fits a Gaussian using the exponential running average of class centroids to capture uncertainty in representation space due to label noise and data scarcity. With SFA, we detect noisy samples based on their distances to class centroids sampled from this Gaussian distribution. Based on the identified clean samples, we then propose to train an auxiliary balanced classifier to improve the generalization for the minority class and facilitate the update of Gaussian parameters. Extensive experimental results show that SFA can enhance the performance of existing methods on both simulated and real-world datasets. Further, we propose to combine SFA with the sample-selection approach, distribution-robust, and noise-robust loss functions, resulting in significant improvement in performance over the baselines. Our code is available at https://github.com/HotanLee/SFA
Machine Learning -> ML: Multi-label
Machine Learning -> ML: Semi-supervised learning
List of keywords 
Machine Learning -> ML: Weakly supervised learning Machine Learning -> ML: Multi-label
Machine Learning -> ML: Semi-supervised learning
2147
On the Complexity of Counterfactual Reasoning
[+] More 
[-] Less 
We study the computational complexity of counterfactual reasoning in relation to the complexity of associational and interventional reasoning on structural causal models (SCMs). We show that counterfactual reasoning is no harder than associational or interventional reasoning on fully specified SCMs in the context of two computational frameworks. The first framework is based on the notion of treewidth and includes the classical variable elimination and jointree algorithms. The second framework is based on the more recent and refined notion of causal treewidth which is directed towards models with functional dependencies such as SCMs. Our results are constructive and based on bounding the (causal) treewidth of twin networks—used in standard counterfactual reasoning that contemplates two worlds, real and imaginary—to the (causal) treewidth of the underlying SCM structure. In particular, we show that the latter (causal) treewidth is no more than twice the former plus one. Hence, if associational or interventional reasoning is tractable on a fully specified SCM then counterfactual reasoning is tractable too. We extend our results to general counterfactual reasoning that requires contemplating more than two worlds and discuss applications of our results to counterfactual reasoning with partially specified SCMs that are coupled with data. We finally present empirical results that measure the gap between the complexities of counterfactual reasoning and associational/interventional reasoning on random SCMs.
Knowledge Representation and Reasoning -> KRR: Causality
Uncertainty in AI -> UAI: Bayesian networks
List of keywords 
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference Knowledge Representation and Reasoning -> KRR: Causality
Uncertainty in AI -> UAI: Bayesian networks
2160
Manifold-Aware Self-Training for Unsupervised Domain Adaptation on Regressing 6D Object Pose
[+] More 
[-] Less 
Domain gap between synthetic and real data in visual regression (e.g., 6D pose estimation) is bridged in this paper via global feature alignment and local refinement on the coarse classification of discretized anchor classes in target space, which imposes a piece-wise target manifold regularization into domain-invariant representation learning. Specifically, our method incorporates an explicit self-supervised manifold regularization, revealing consistent cumulative target dependency across domains, to a self-training scheme (e.g., the popular Self-Paced Self-Training) to encourage more discriminative transferable representations of regression tasks. Moreover, learning unified implicit neural functions to estimate relative direction and distance of targets to their nearest class bins aims to refine target classification predictions, which can gain robust performance against inconsistent feature scaling sensitive to UDA regressors. Experiment results on three public benchmarks of the challenging 6D pose estimation task can verify the effectiveness of our method, consistently achieving superior performance to the state-of-the-art for UDA on 6D pose estimation. Codes and pre-trained models are available https://github.com/Gorilla-Lab-SCUT/MAST.
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
2178
On the Study of Curriculum Learning for Inferring Dispatching Policies on the Job Shop Scheduling
[+] More 
[-] Less 
This paper studies the use of Curriculum Learning on Reinforcement Learning (RL) to improve the performance of the dispatching policies learned on the Job-shop Scheduling Problem (JSP). Current works in the literature present a large optimality gap when learning end-to-end solutions on this problem. In this regard, we identify the difficulty for RL to learn directly on large instances as part of the issue and use Curriculum Learning (CL) to mitigate this effect. Particularly, CL sequences the learning process in a curriculum of increasing complexity tasks, which allows learning on large instances that otherwise would be impossible to learn from scratch. In this paper, we present a size-agnostic model that enables us to demonstrate that current curriculum strategies have a major impact on the quality of the solution inferred. In addition, we introduce a novel Reinforced Adaptive Staircase Curriculum Learning (RASCL) strategy, which adjusts the difficulty level during the learning process by revisiting the worst-performing instances. Conducted experiments on Taillard’s and Demirkol’s datasets show that the presented approach significantly improves the current stateof-the-art models on the JSP. It reduces the average optimality gap from 19.35% to 10.46% on Taillard’s instances and from 38.43% to 18.85% on Demirkol’s instances.
Planning and Scheduling -> PS: Scheduling
List of keywords 
Planning and Scheduling -> PS: Learning in planning and scheduling Planning and Scheduling -> PS: Scheduling
2184
Exploring Safety Supervision for Continual Test-time Domain Adaptation
[+] More 
[-] Less 
Continual test-time domain adaptation aims to adapt a source pre-trained model to a continually changing target domain without using any source data. Unfortunately, existing methods based on pseudo-label learning suffer from the changing target domain environment, and the quality of generated pseudo-labels is attenuated due to the domain shift, leading to instantaneous negative learning and long-term knowledge forgetting. To solve these problems, in this paper, we propose a simple yet effective framework for exploring safety supervision with three elaborate strategies: Label Safety, Sample Safety, and Parameter Safety. Firstly, to select reliable pseudo-labels, we define and adjust the confidence threshold in a self-adaptive manner according to the test-time learning status. Secondly, a soft-weighted contrastive learning module is presented to explore the highly-correlated samples and discriminate uncorrelated ones, improving the instantaneous efficiency of the model. Finally, we frame a Soft Weight Alignment strategy to normalize the distance between the parameters of the adapted model and the source pre-trained model, which alleviates the long-term problem of knowledge forgetting and significantly improves the accuracy of the adapted model in the late adaptation stage. Extensive experimental results demonstrate that our method achieves state-of-the-art performance on several benchmark datasets.
Computer Vision -> CV: Representation learning
List of keywords 
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning    Computer Vision -> CV: Representation learning
2193
Structural Hawkes Processes for Learning Causal Structure from Discrete-Time Event Sequences
[+] More 
[-] Less 
Learning causal structure among event types from discrete-time event sequences is a particularly important but challenging task. Existing methods, such as the multivariate Hawkes processes based methods, mostly boil down to learning the so-called Granger causality which assumes that the cause event happens strictly prior to its effect event. Such an assumption is often untenable beyond applications, especially when dealing with discrete-time event sequences in low-resolution; and typical discrete Hawkes processes mainly suffer from identifiability issues raised by the instantaneous effect, i.e., the causal relationship that occurred simultaneously due to the low-resolution data will not be captured by Granger causality. In this work, we propose Structure Hawkes Processes (SHPs) that leverage the instantaneous effect for learning the causal structure among events type in discrete-time event sequence. The proposed method is featured with the Expectation-Maximization of the likelihood function and a sparse optimization scheme. Theoretical results show that the instantaneous effect is a blessing rather than a curse, and the causal structure is identifiable under the existence of the instantaneous effect. Experiments on synthetic and real-world data verify the effectiveness of the proposed method.
List of keywords 
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference 2225
Privacy-Preserving End-to-End Spoken Language Understanding
[+] More 
[-] Less 
Spoken language understanding (SLU), one of the key enabling technologies for human-computer interaction in IoT devices, provides an easy-to-use user interface. Human speech can contain a lot of user-sensitive information, such as gender, identity, and sensitive content. New types of security and privacy breaches have thus emerged. Users do not want to expose their personal sensitive information to malicious attacks by untrusted third parties. Thus, the SLU system needs to ensure that a potential malicious attacker cannot deduce the sensitive attributes of the users, while it should avoid greatly compromising the SLU accuracy. To address the above challenge, this paper proposes a novel SLU multi-task privacy-preserving model to prevent both the speech recognition (ASR) and identity recognition (IR) attacks. The model uses the hidden layer separation technique so that SLU information is distributed only in a specific portion of the hidden layer, and the other two types of information are removed to obtain a privacy-secure hidden layer. In order to achieve good balance between efficiency and privacy, we introduce a new mechanism of model pre-training, namely joint adversarial training, to further enhance the user privacy. Experiments over two SLU datasets show that the proposed method can reduce the accuracy of both the ASR and IR attacks close to that of a random guess, while leaving the SLU performance largely unaffected.
Machine Learning -> ML: Adversarial machine learning
Natural Language Processing -> NLP: Speech
List of keywords 
Natural Language Processing -> NLP: Dialogue and interactive systems Machine Learning -> ML: Adversarial machine learning
Natural Language Processing -> NLP: Speech
2228
CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation
[+] More 
[-] Less 
The hybrid architecture of convolutional neural networks (CNNs) and Transformer are very popular for medical image segmentation. However, it suffers from two challenges. First, although a CNNs branch can capture the local image features using vanilla convolution, it cannot achieve adaptive feature learning. Second, although a Transformer branch can capture the global features, it ignores the channel and cross-dimensional self-attention, resulting in a low segmentation accuracy on complex-content images. To address these challenges, we propose a novel hybrid architecture of convolutional neural networks hand in hand with vision Transformers (CiT-Net) for medical image segmentation. Our network has two advantages. First, we design a dynamic deformable convolution and apply it to the CNNs branch, which overcomes the weak feature extraction ability due to fixed-size convolution kernels and the stiff design of sharing kernel parameters among different inputs. Second, we design a shifted-window adaptive complementary attention module and a compact convolutional projection. We apply them to the Transformer branch to learn the cross-dimensional long-term dependency for medical images. Experimental results show that our CiT-Net provides better medical image segmentation results than popular SOTA methods. Besides, our CiT-Net requires lower parameters and less computational costs and does not rely on pre-training. The code is publicly available at https://github.com/SR0920/CiT-Net.
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Machine learning for vision
List of keywords 
Computer Vision -> CV: Biomedical image analysis Computer Vision -> CV: Segmentation
Computer Vision -> CV: Machine learning for vision
2229
On Lower Bounds for Maximin Share Guarantees
[+] More 
[-] Less 
We study the problem of fairly allocating a set of indivisible items to a set of agents with additive valuations. Recently, Feige et al. (WINE’21) proved that a maximin share (MMS) allocation exists for all instances with n agents and no more than n + 5 items. Moreover, they proved that an MMS allocation is not guaranteed to exist for instances with 3 agents and at least 9 items, or n ≥ 4 agents and at least 3n + 3 items. In this work, we shrink the gap between these upper and lower bounds for guaranteed existence of MMS allocations. We prove that for any integer c > 0, there exists a number of agents n_c such that an MMS allocation exists for any instance with n ≥ n_c agents and at most n + c items, where n_c ≤ ⌊0.6597^c · c!⌋ for allocation of goods and n_c ≤ ⌊0.7838^c · c!⌋ for chores. Furthermore, we show that for n ≠ 3 agents, all instances with n + 6 goods have an MMS allocation.
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division 2230
Optimal Decision Trees For Interpretable Clustering with Constraints
[+] More 
[-] Less 
Constrained clustering is a semi-supervised task that employs a limited amount of labelled data, formulated as constraints, to incorporate domain-specific knowledge and to significantly improve clustering accuracy. Previous work has considered exact optimization formulations that can guarantee optimal clustering while satisfying all constraints, however these approaches lack interpretability. Recently, decision trees have been used to produce inherently interpretable clustering solutions, however existing approaches do not support clustering constraints and do not provide strong theoretical guarantees on solution quality. In this work, we present a novel SAT-based framework for interpretable clustering that supports clustering constraints and that also provides strong theoretical guarantees on solution quality. We also present new insight into the trade-off between interpretability and satisfaction of such user-provided constraints. Our framework is the first approach for interpretable and constrained clustering. Experiments with a range of real-world and synthetic datasets demonstrate that our approach can produce high-quality and interpretable constrained clustering solutions.
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Machine Learning -> ML: Clustering
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint optimization Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Machine Learning -> ML: Clustering
2241
Uncovering the Largest Community in Social Networks at Scale
[+] More 
[-] Less 
The Maximum k-Plex Search (MPS) can find the largest k-plex, which is a generalization of the largest clique.
Although MPS is commonly used in AI to effectively discover real-world communities of social networks, existing MPS algorithms suffer from high computational costs because they iteratively scan numerous nodes to find the largest k-plex.
Here, we present an efficient MPS algorithm called Branch-and-Merge (BnM), which outputs an exact maximum k-plex.
BnM merges unnecessary nodes to explore a smaller graph than the original one.
Extensive evaluations on real-world social networks demonstrate that BnM significantly outperforms other state-of-the-art MPS algorithms in terms of running time.
Data Mining -> DM: Applications
List of keywords 
Data Mining -> DM: Mining text, web, social media Data Mining -> DM: Applications
2250
DeLELSTM: Decomposition-based Linear Explainable LSTM to Capture  Instantaneous  and Long-term Effects in Time Series
[+] More 
[-] Less 
Time series forecasting is prevalent in various real-world applications. Despite the promising results of deep learning models in time series forecasting, especially the Recurrent Neural Networks (RNNs), the explanations of time series models, which are critical in high-stakes applications, have received little attention. In this paper, we propose a Decomposition-based Linear Explainable LSTM (DeLELSTM) to improve the interpretability of LSTM. Conventionally, the interpretability of RNNs only concentrates on the variable importance and time importance. We additionally distinguish between the instantaneous influence of new coming data and the long-term effects of historical data. Specifically, DeLELSTM consists of two components, i.e., standard LSTM and tensorized LSTM. The tensorized LSTM assigns each variable with a unique hidden state making up a matrix h(t), and the standard LSTM models all the variables with a shared hidden state H(t). By decomposing the H(t) into the linear combination of past information h(t-1) and the fresh information h(t)-h(t-1), we can get the instantaneous influence and the long-term effect of each feature. In addition, the advantage of linear regression also makes the explanation transparent and clear. We demonstrate the effectiveness and interpretability of DeLELSTM on three empirical datasets. Extensive experiments show that the proposed method achieves competitive performance against the baseline methods and provides a reliable explanation relative to domain knowledge.
Machine Learning -> ML: Time series and data streams
List of keywords 
Machine Learning -> ML: Explainable/Interpretable machine learning Machine Learning -> ML: Time series and data streams
2261
Contour-based Interactive Segmentation
[+] More 
[-] Less 
Recent advances in interactive segmentation (IS)
allow speeding up and simplifying image editing
and labeling greatly. The majority of modern IS
approaches accept user input in the form of clicks.
However, using clicks may require too many user
interactions, especially when selecting small ob-
jects, minor parts of an object, or a group of ob-
jects of the same type. In this paper, we consider
such a natural form of user interaction as a loose
contour, and introduce a contour-based IS method.
We evaluate the proposed method on the standard
segmentation benchmarks, our novel UserContours
dataset, and its subset UserContours-G containing
difficult segmentation cases. Through experiments,
we demonstrate that a single contour provides the
same accuracy as multiple clicks, thus reducing the
required amount of user interactions.
Computer Vision -> CV: Machine learning for vision
List of keywords 
Computer Vision -> CV: Segmentation Computer Vision -> CV: Machine learning for vision
2271
VecoCare: Visit Sequences-Clinical Notes Joint Learning for Diagnosis Prediction in Healthcare Data
[+] More 
[-] Less 
Due to the insufficiency of electronic health records (EHR) data utilized in practical diagnosis prediction scenarios, most works are devoted to learning powerful patient representations either from structured EHR data (e.g., temporal medical events, lab test results, etc.) or unstructured data (e.g., clinical notes, etc.). However, synthesizing rich information from both of them still needs to be explored. Firstly, the heterogeneous semantic biases across them heavily hinder the synthesis of representation spaces, which is critical for diagnosis prediction. Secondly, the intermingled quality of partial clinical notes leads to inadequate representations of to-be-predicted patients. Thirdly, typical attention mechanisms mainly focus on aggregating information from similar patients, ignoring important auxiliary information from others. To tackle these challenges, we propose a novel visit sequences-clinical notes joint learning approach, dubbed VecoCare. It performs a Gromov-Wasserstein Distance (GWD)-based contrastive learning task and an adaptive masked language model task in a sequential pre-training manner to reduce heterogeneous semantic biases. After pre-training, VecoCare further aggregates information from both similar and dissimilar patients through a dual-channel retrieval mechanism. We conduct diagnosis prediction experiments on two real-world datasets, which indicates that VecoCare outperforms state-of-the-art approaches. Moreover, the findings discovered by VecoCare are consistent with the medical researches.
Machine Learning -> ML: Representation learning
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Health and medicine Machine Learning -> ML: Representation learning
2272
Adaptive Estimation Q-learning with Uncertainty and Familiarity
[+] More 
[-] Less 
One of the key problems in model-free deep reinforcement learning is how to obtain more accurate value estimations. Current most widely-used off-policy algorithms suffer from over- or underestimation bias which may lead to unstable policy. In this paper, we propose a novel method, Adaptive Estimation Q-learning (AEQ), which uses uncertainty and familiarity to control the value estimation naturally and can adaptively change for specific state-action pair. We theoretically prove the property of our familiarity term which can even keep the expected estimation bias approximate to 0, and experimentally demonstrate our dynamic estimation can improve the performance and prevent the bias continuously increasing. We evaluate AEQ on several continuous control tasks, outperforming state-of-the-art performance. Moreover, AEQ is simple to implement and can be applied in any off-policy actor-critic algorithm.
Machine Learning -> ML: Ensemble methods
List of keywords 
Machine Learning -> ML: Deep reinforcement learning Machine Learning -> ML: Ensemble methods
2276
Commonsense Knowledge Enhanced Sentiment Dependency Graph for Sarcasm Detection
[+] More 
[-] Less 
Sarcasm is widely utilized on social media platforms such as Twitter and Reddit. Sarcasm detection is required for analyzing people’s true feelings since sarcasm is commonly used to portray a reversed emotion opposing the literal meaning. The syntactic structure is the key to make better use of commonsense when detecting sarcasm. However, it is extremely challenging to effectively and explicitly explore the information implied in syntactic structure and commonsense simultaneously. In this paper, we apply the pre-trained COMET model to generate relevant commonsense knowledge, and explore a novel scenario of constructing a commonsense-augmented sentiment graph and a commonsense-replaced dependency graph for each text. Based on this, a Commonsense Sentiment Dependency Graph Convolutional Network (CSDGCN) framework is proposed to explicitly depict the role of external commonsense and inconsistent expressions over the context for sarcasm detection by interactively modeling the sentiment and dependency information. Experimental results on several benchmark datasets reveal that our proposed method beats the state-of-the-art methods in sarcasm detection, and has a stronger interpretability.
Machine Learning -> ML: Knowledge-aided learning
Natural Language Processing -> NLP: Text classification
List of keywords 
Data Mining -> DM: Mining graphs Machine Learning -> ML: Knowledge-aided learning
Natural Language Processing -> NLP: Text classification
2296
Complete Instances Mining for Weakly Supervised Instance Segmentation
[+] More 
[-] Less 
Weakly supervised instance segmentation (WSIS) using only image-level labels is a challenging task due to the difficulty of aligning coarse annotations with the finer task. However, with the advancement of deep neural networks (DNNs), WSIS has garnered significant attention. Following a proposal-based paradigm, we encounter a redundant segmentation problem resulting from a single instance being represented by multiple proposals. For example, we feed a picture of a dog and proposals into the network and expect to output only one proposal containing a dog, but the network outputs multiple proposals. To address this problem, we propose a novel approach for WSIS that focuses on the online refinement of complete instances through the use of MaskIoU heads to predict the integrity scores of proposals and a Complete Instances Mining (CIM) strategy to explicitly model the redundant segmentation problem and generate refined pseudo labels. Our approach allows the network to become aware of multiple instances and complete instances, and we further improve its robustness through the incorporation of an Anti-noise strategy. Empirical evaluations on the PASCAL VOC 2012 and MS COCO datasets demonstrate that our method achieves state-of-the-art performance with a notable margin. Our implementation will be made available at https://github.com/ZechengLi19/CIM.
Computer Vision -> CV: Segmentation
Machine Learning -> ML: Weakly supervised learning
List of keywords 
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning    Computer Vision -> CV: Segmentation
Machine Learning -> ML: Weakly supervised learning
2309
Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables
[+] More 
[-] Less 
Recent advanced Table Structure Recognition (TSR) models adopt image-to-text solutions to parse table structure. These methods can be formulated as image caption problem, i.e., input a single-table image and output table structure description in a specific text format, e.g., HTML. With the impressive success of Transformer in text generation tasks, these methods use Transformer architecture to predict HTML table text in an autoregressive manner. However, tables always emerge with a large variety of shapes and sizes. Autoregressive models usually suffer from the error accumulation problem as the length of predicted text increases, which results in unsatisfactory performance for large tables. In this paper, we propose a novel image-to-text based TSR method that relieves error accumulation problems and improves performance noticeably. At the core of our method is a cascaded two-step decoder architecture with the former decoder predicting HTML table row tags non-autoregressively and the latter predicting HTML table cell tags of each row in a semi-autoregressive manner. Compared with existing methods that predict HTML text autoregressively, the superiority of our row-to-cell progressive table parsing is twofold: (1) it generates an HTML tag sequence with a vertical-and-horizontal two-step `scanning’, which better fits the inherent 2D structure of image data, (2) it performs substantially better for large tables (long sequence prediction) since it alleviates error accumulation problem specific to autoregressive models. Extensive experiments demonstrate that our method achieves competitive performance on three public benchmarks.
Computer Vision -> CV: Applications
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Applications
2321
Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network for Spatial-Temporal Action Localization
[+] More 
[-] Less 
The key to video action detection lies in the understanding of interaction between persons and background objects in a video. Current methods usually employ object detectors to extract objects directly or use grid features to represent objects in the environment, which underestimate the great potential of multi-scale context information (e.g., objects and scenes of different sizes). How to exactly represent the multi-scale context and make full utilization of it still remains an unresolved challenge for spatial-temporal action localization. In this paper, we propose a novel Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network (AMCRNet) that extracts multi-scale context through multiple pooling layers with different sizes. Specifically, we develop an Interactive Relation Extraction module to model the higher-order relation between the target person and the context (e.g., other persons and objects). Along this line, we further propose a History Feature Bank and Interaction method to achieve better performance by modeling such relation across continuing video clips. Extensive experimental results on AVA2.2 and UCF101-24 demonstrate the superiority and rationality of our proposed AMCRNet.
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Video analysis and understanding
List of keywords 
Computer Vision -> CV: Action and behavior recognition Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Video analysis and understanding
2338
CSGCL: Community-Strength-Enhanced Graph Contrastive Learning
[+] More 
[-] Less 
Graph Contrastive Learning (GCL) is an effective way to learn generalized graph representations in a self-supervised manner, and has grown rapidly in recent years. However, the underlying community semantics has not been well explored by most previous GCL methods. Research that attempts to leverage communities in GCL regards them as having the same influence on the graph, leading to extra representation errors. To tackle this issue, we define ”community strength” to measure the difference of influence among communities. Under this premise, we propose a Community-Strength-enhanced Graph Contrastive Learning (CSGCL) framework to preserve community strength throughout the learning process. Firstly, we present two novel graph augmentation methods, Communal Attribute Voting (CAV) and Communal Edge Dropping (CED), where the perturbations of node attributes and edges are guided by community strength. Secondly, we propose a dynamic ”Team-up” contrastive learning scheme, where community strength is used to progressively fine-tune the contrastive objective. We report extensive experiment results on three downstream tasks: node classification, node clustering, and link prediction. CSGCL achieves state-of-the-art performance compared with other GCL methods, validating that community strength brings effectiveness and generality to graph representations. Our code is available at https://github.com/HanChen-HUST/CSGCL.
Machine Learning -> ML: Self-supervised Learning
List of keywords 
Data Mining -> DM: Mining graphs Machine Learning -> ML: Self-supervised Learning
2339
Generalized Discriminative Deep Non-Negative Matrix Factorization Based on Latent Feature and Basis Learning
[+] More 
[-] Less 
As a powerful tool for data representation, deep NMF has attracted much attention in recent years. Current deep NMF builds the multi-layer structure by decomposing either basis matrix or feature matrix into multiple factors, and probably complicates the learning process when data is insufficient or exhibits simple structure. To overcome the limitations, a novel method called Generalized Deep Non-negative Matrix Factorization (GDNMF) is proposed, which generalizes several NMF and deep NMF methods in a unified framework. GDNMF simultaneously performs decomposition on both features and bases, which learns a hierarchical data representation based on multi-level basis. To further improve the latent representation and enhance its flexibility, GDNMF mutually reinforces shallow linear model and deep non-linear model. Moreover, semi-supervised GDNMF is proposed by treating partial label information as soft constraints in the multi-layer structure. An efficient two-phase optimization algorithm is developed, and experiments on five real-world datesets verify its superior performance compared with state-of-the-art methods.
Machine Learning -> ML: Weakly supervised learning
List of keywords 
Machine Learning -> ML: Clustering Machine Learning -> ML: Weakly supervised learning
2358
Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition
[+] More 
[-] Less 
Vision model have gained increasing attention due to their simplicity and efficiency in Scene Text Recognition (STR) task. However, due to lacking the perception of linguistic knowledge and information, recent vision models suffer from two problems: (1) the pure vision-based query results in attention drift, which usually causes poor recognition and is summarized as linguistic insensitive drift (LID) problem in this paper. (2) the visual feature is suboptimal for the recognition in some vision-missing cases (e.g. occlusion, etc.). To address these issues, we propose a Linguistic Perception Vision model (LPV), which explores the linguistic capability of vision model for accurate text recognition. To alleviate the LID problem, we introduce a Cascade Position Attention (CPA) mechanism that obtains high-quality and accurate attention maps through step-wise optimization and linguistic information mining. Furthermore, a Global Linguistic Reconstruction Module (GLRM) is proposed to improve the representation of visual features by perceiving the linguistic information in the visual space, which gradually converts visual features into semantically rich ones during the cascade process. Different from previous methods, our method obtains SOTA results while keeping low complexity (92.4% accuracy with only 8.11M parameters). Code is available at https://github.com/CyrilSterling/LPV.
Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Vision and language
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Scene analysis and understanding
Computer Vision -> CV: Vision and language
2362
G2Pxy: Generative Open-Set Node Classification on Graphs with Proxy Unknowns
[+] More 
[-] Less 
Node classification is the task of predicting the labels of unlabeled nodes in a graph. State-of-the-art methods based on graph neural networks achieve  excellent performance when all labels are available
 during training. But in real-life, models are of ten applied on data with new classes, which can lead to massive misclassification and thus significantly degrade performance. Hence, developing
 open-set classification methods is crucial to determine if a given sample belongs to a known class. Existing methods for open-set node classification generally use transductive learning with part or all
 of the features of real unseen class nodes to help with open-set classification. In this paper, we propose a novel generative open-set node classification method, i.e., G2Pxy, which follows a stricter inductive learning setting where no information about unknown classes is available during training and validation. Two kinds of proxy unknown nodes, inter-class unknown proxies and external unknown proxies are generated via mixup to efficiently anticipate the distribution of novel classes. Using the generated proxies, a closed-set classifier can be transformed into an open-set one, by augmenting it with an extra proxy classifier. Under the constraints
 of both cross entropy loss and complement entropy loss, G2Pxy achieves superior effectiveness for unknown class detection and known class classification, which is validated by experiments on bench
mark graph datasets. Moreover, G2Pxy does not have specific requirement on the GNN architecture and shows good generalizations.
Data Mining -> DM: Mining graphs
List of keywords 
Machine Learning -> ML: Classification Data Mining -> DM: Mining graphs
2366
Enabling Abductive Learning to Exploit Knowledge Graph
[+] More 
[-] Less 
Most systems integrating data-driven machine learning with knowledge-driven reasoning usually rely on a specifically designed knowledge base to enable efficient symbolic inference. However, it could be cumbersome for the nonexpert end-users to prepare such a knowledge base in real tasks. Recent years have witnessed the success of large-scale knowledge graphs, which could be ideal domain knowledge resources for real-world machine learning tasks. However, these large-scale knowledge graphs usually contain much information that is irrelevant to a specific learning task. Moreover, they often contain a certain degree of noise. Existing methods can hardly make use of them because the large-scale probabilistic logical inference is usually intractable. To address these problems, we present ABductive Learning with Knowledge Graph (ABL-KG) that can automatically mine logic rules from knowledge graphs during learning, using a knowledge forgetting mechanism for filtering out irrelevant information. Meanwhile, these rules can form a logic program that enables efficient joint optimization of the machine learning model and logic inference within the Abductive Learning (ABL) framework. Experiments on four different tasks show that ABL-KG can automatically extract useful rules from large-scale and noisy knowledge graphs, and significantly improve the performance of machine learning with only a handful of labeled data.
Knowledge Representation and Reasoning -> KRR: Diagnosis and abductive reasoning
Machine Learning -> ML: Weakly supervised learning
List of keywords 
Machine Learning -> ML: Knowledge-aided learning Knowledge Representation and Reasoning -> KRR: Diagnosis and abductive reasoning
Machine Learning -> ML: Weakly supervised learning
2409
DenseDINO: Boosting Dense Self-Supervised Learning with Token-Based Point-Level Consistency
[+] More 
[-] Less 
In this paper, we propose a simple yet effective transformer framework for self-supervised learning called DenseDINO to learn dense visual representations. To exploit the spatial information that the dense prediction tasks require but neglected by the existing self-supervised transformers, we introduce point-level supervision across views in a novel token-based way. Specifically, DenseDINO introduces some extra input tokens called reference tokens to match the point-level features with the position prior. With the reference token, the model could maintain spatial consistency and deal with multi-object complex scene images, thus generalizing better on dense prediction tasks. Compared with the vanilla DINO, our approach obtains competitive performance when evaluated on classification in ImageNet and achieves a large margin (+7.2% mIoU) improvement in semantic segmentation on PascalVOC under the linear probing protocol for segmentation.
Computer Vision -> CV: Representation learning
List of keywords 
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning    Computer Vision -> CV: Representation learning
2446
Active Visual Exploration Based on Attention-Map Entropy
[+] More 
[-] Less 
Active visual exploration addresses the issue of limited sensor capabilities in real-world scenarios, where successive observations are actively chosen based on the environment. To tackle this problem, we introduce a new technique called Attention-Map Entropy (AME). It leverages the internal uncertainty of the transformer-based model to determine the most informative observations. In contrast to existing solutions, it does not require additional loss components, which simplifies the training. Through experiments, which also mimic retina-like sensors, we show that such simplified training significantly improves the performance of reconstruction, segmentation and classification on publicly available datasets.
Machine Learning -> ML: Attention models
Robotics -> ROB: Robotics and vision
List of keywords 
Computer Vision -> CV: Machine learning for vision Machine Learning -> ML: Attention models
Robotics -> ROB: Robotics and vision
2451
Deep Hierarchical Communication Graph in Multi-Agent Reinforcement Learning
[+] More 
[-] Less 
Sharing intentions is crucial for efficient cooperation in communication-enabled multi-agent reinforcement learning. Recent work applies static or undirected graphs to determine the order of interaction. However, the static graph is not general for complex cooperative tasks, and the parallel message-passing update in the undirected graph with cycles cannot guarantee convergence. To solve this problem, we propose Deep Hierarchical Communication Graph (DHCG) to learn the dependency relationships between agents based on their messages. The relationships are formulated as directed acyclic graphs (DAGs), where the selection of the proper topology is viewed as an action and trained in an end-to-end fashion. To eliminate the cycles in the graph, we apply an acyclicity constraint as intrinsic rewards and then project the graph in the admissible solution set of DAGs. As a result, DHCG removes redundant communication edges for cost improvement and guarantees convergence. To show the effectiveness of the learned graphs, we propose policy-based and value-based DHCG. Policy-based DHCG factorizes the joint policy in an auto-regressive manner, and value-based DHCG factorizes the joint value function to individual value functions and pairwise payoff functions. Empirical results show that our method improves performance across various cooperative multi-agent tasks, including Predator-Prey, Multi-Agent Coordination Challenge, and StarCraft Multi-Agent Challenge.
Agent-based and Multi-agent Systems -> MAS: Agent communication
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning Agent-based and Multi-agent Systems -> MAS: Agent communication
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
2459
Scalable Coupling of Deep Learning with Logical Reasoning
[+] More 
[-] Less 
In the ongoing quest for hybridizing discrete reasoning with neural nets, there is an increasing interest in neural architectures that can learn how to solve discrete reasoning or optimization problems from natural inputs. In this paper, we introduce a scalable neural architecture and loss function dedicated to learning the constraints and criteria of NP-hard reasoning problems expressed as discrete Graphical Models. We empirically show our loss function is able to efficiently learn how to solve NP-hard reasoning problems from natural inputs as the symbolic, visual or many-solutions Sudoku problems as well as the energy optimization formulation of the protein design problem, providing data efficiency, interpretability, and a posteriori control over predictions.
Constraint Satisfaction and Optimization -> CSO: Constraint learning and acquisition
Multidisciplinary Topics and Applications -> MDA: Bioinformatics
List of keywords 
Machine Learning -> ML: Neuro-symbolic methods Constraint Satisfaction and Optimization -> CSO: Constraint learning and acquisition
Multidisciplinary Topics and Applications -> MDA: Bioinformatics
2466
CONGREGATE: Contrastive Graph Clustering in Curvature Spaces
[+] More 
[-] Less 
Graph clustering is a longstanding research topic, and has achieved remarkable success with the deep learning methods in recent years. Nevertheless, we observe that several important issues largely remain open. On the one hand, graph clustering from the geometric perspective is appealing but has rarely been touched before, as it lacks a promising space for geometric clustering. On the other hand, contrastive learning boosts the deep graph clustering but usually struggles in either graph augmentation or hard sample mining. To bridge this gap, we rethink the problem of graph clustering from geometric perspective and, to the best of our knowledge, make the first attempt to introduce a heterogeneous curvature space to graph clustering problem. Correspondingly, we present a novel end-to-end contrastive graph clustering model named CONGREGATE, addressing geometric graph clustering with Ricci curvatures. To support geometric clustering, we construct a theoretically grounded Heterogeneous Curvature Space where deep representations are generated via the product of the proposed fully Riemannian graph convolutional nets. Thereafter, we train the graph clusters by an augmentation-free reweighted contrastive approach where we pay more attention to both hard negatives and hard positives in our curvature space. Empirical results on real-world graphs show that our model outperforms the state-of-the-art competitors.
Machine Learning -> ML: Clustering
List of keywords 
Data Mining -> DM: Mining graphs Machine Learning -> ML: Clustering
2470
LGI-GT: Graph Transformers with Local and Global Operators Interleaving
[+] More 
[-] Less 
Since Transformers can alleviate some critical and fundamental problems of graph neural networks (GNNs), such as over-smoothing, over-squashing and limited expressiveness, they have been successfully applied to graph representation learning and achieved impressive results. However, although there are many works dedicated to make graph Transformers (GTs) aware of the structure and edge information by specifically tailored attention forms or graph-related positional and structural encodings, few works address the problem of how to construct high-performing GTs with modules of GNNs and Transformers. In this paper, we propose a novel graph Transformer with local and global operators interleaving (LGI-GT), in which we further design a new method propagating embeddings of the [CLS] token for global information representation. Additionally, we propose an effective message passing module called edge enhanced local attention (EELA), which makes LGI-GT a full-attention GT. Extensive experiments demonstrate that LGI-GT performs consistently better than previous state-of-the-art GNNs and GTs, while ablation studies show the effectiveness of the proposed LGI scheme and EELA. The source code of LGI-GT is available at https://github.com/shuoyinn/LGI-GT.
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Representation learning
List of keywords 
Machine Learning -> ML: Sequence and graph learning Data Mining -> DM: Mining graphs
Machine Learning -> ML: Representation learning
2471
Reverse Engineering of Temporal Queries Mediated by LTL Ontologies
[+] More 
[-] Less 
In reverse engineering of database queries, we aim to construct a query from a given set of  answers and non-answers; it can then be used to explore the data further or as an explanation of the answers and non-answers. We investigate this query-by-example problem for queries formulated in positive fragments of linear temporal logic LTL over timestamped data, focusing on the design of suitable query languages and the combined and data complexity of deciding whether there exists a query in the given language that separates the given answers from non-answers. We consider both plain LTL queries and those mediated by LTL ontologies.
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
List of keywords 
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
2479
U-Match: Two-view Correspondence Learning with Hierarchy-aware Local Context Aggregation
[+] More 
[-] Less 
Local context capturing has become the core factor for achieving leading performance in two-view correspondence learning. Recent advances have devised various local context extractors whereas typically adopting explicit neighborhood relation modeling that is restricted and inflexible. To address this issue, we introduce U-Match, an attentional graph neural network that has the flexibility to enable implicit local context awareness at multiple levels. Specifically, a hierarchy-aware graph representation (HAGR) module is designed and fleshed out by local context pooling and unpooling operations. The former encodes local context by adaptively sampling a set of nodes to form a coarse-grained graph, while the latter decodes local context by recovering the coarsened graph back to its original size. Moreover, an orthogonal fusion module is proposed for the collaborative use of HAGR module, which integrates complementary local and global information into compact feature representations without redundancy. Extensive experiments on different visual tasks prove that our method significantly surpasses the state-of-the-arts. In particular, U-Match attains an AUC at 5 degree threshold of 60.53% on the challenging YFCC100M dataset without RANSAC, outperforming the strongest prior model by 8.61 absolute percentage points. Our code is publicly available at https://github.com/ZizhuoLi/U-Match.
Computer Vision -> CV: Image and video retrieval
List of keywords 
Computer Vision -> CV: Motion and tracking Computer Vision -> CV: Image and video retrieval
2490
Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
[+] More 
[-] Less 
Transformers achieve promising performance in document understanding because of their high effectiveness and still suffer from quadratic computational complexity dependency on the sequence length. General efficient transformers are challenging to be directly adapted to model document. They are unable to handle the layout representation in documents, e.g. word, line and paragraph, on different granularity levels and seem hard to achieve a good trade-off between efficiency and performance. To tackle the concerns, we propose Fast-StrucTexT, an efficient multi-modal framework based on the StrucTexT algorithm with an hourglass transformer architecture, for visual document understanding. Specifically, we design a modality-guided dynamic token merging block to make the model learn multi-granularity representation and prunes redundant tokens. Additionally, we present a multi-modal interaction module called Symmetry Cross-Attention (SCA) to consider multi-modal fusion and efficiently guide the token mergence. The SCA allows one modality input as query to calculate cross attention with another modality in a dual phase. Extensive experiments on FUNSD, SROIE, and CORD datasets demonstrate that our model achieves the state-of-the-art performance and almost 1.9x faster inference time than the state-of-the-art methods.
Machine Learning -> ML: Multi-modal learning
List of keywords 
Natural Language Processing -> NLP: Information extraction Machine Learning -> ML: Multi-modal learning
2491
CVTP3D: Cross-view Trajectory Prediction Using Shared 3D Queries for Autonomous Driving
[+] More 
[-] Less 
Trajectory prediction with uncertainty is a critical and challenging task for autonomous driving. Nowadays, we can easily access sensor data represented in multiple views. However, cross-view consistency has not been evaluated by the existing models, which might lead to divergences between the multimodal predictions from different views. It is not practical and effective when the network does not comprehend the 3D scene, which could cause the downstream module in a dilemma. Instead, we predicts multimodal trajectories while maintaining cross-view consistency. We presented a cross-view trajectory prediction method using shared 3D Queries (XVTP3D). We employ a set of 3D queries shared across views to generate multi-goals that are cross-view consistent. We also proposed a random mask method and coarse-to-fine cross-attention to capture robust cross-view features. As far as we know, this is the first work that introduces the outstanding top-down paradigm in BEV detection field to a trajectory prediction problem. The results of experiments on two publicly available datasets show that XVTP3D achieved state-of-the-art performance with consistent cross-view predictions.
Agent-based and Multi-agent Systems -> MAS: Human-agent interaction
Computer Vision -> CV: Machine learning for vision
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning Agent-based and Multi-agent Systems -> MAS: Human-agent interaction
Computer Vision -> CV: Machine learning for vision
2511
Towards Incremental NER Data Augmentation via Syntactic-aware Insertion Transformer
[+] More 
[-] Less 
Named entity recognition (NER) aims to locate and classify named entities in natural language texts. Most existing high-performance NER models employ a supervised paradigm, which requires a large quantity of high-quality annotated data during training. In order to help NER models perform well in few-shot scenarios, data augmentation approaches attempt to build extra data by means of random editing or by using end-to-end generation with PLMs. 
However, these methods focus on only the fluency of generated sentences, ignoring the syntactic correlation between the new and raw sentences. Such uncorrelation also brings low diversity and inconsistent labeling of synthetic samples. To fill this gap, we present SAINT (Syntactic-Aware InsertioN Transformer), a hard-constraint controlled text generation model that incorporates syntactic information. The proposed method operates by inserting new tokens between existing entities in a parallel manner. During insertion procedure, new tokens will be added taking both semantic and syntactic factors into account. Hence the resulting sentence can retain the syntactic correctness with respect to the raw data. Experimental results on two benchmark datasets, i.e., Ontonotes and Wikiann, demonstrate the comparable performance of SAINT over the state-of-the-art baselines.
Natural Language Processing -> NLP: Named entities
List of keywords 
Natural Language Processing -> NLP: Language generation Natural Language Processing -> NLP: Named entities
2512
Engineering an Efficient Approximate DNF-Counter
[+] More 
[-] Less 
Model counting is a fundamental problem with many practical applications, including query evaluation in probabilistic databases and failure-probability estimation of networks. In this work, we focus on a variant of this problem where the underlying 
 formula is expressed in Disjunctive Normal Form (DNF), also known as #DNF. This problem has been shown to be #P-complete, making it intractable to solve exactly. Much research has therefore been focused on obtaining approximate solutions, particularly in the form of (epsilon, delta) approximations.
The primary contribution of this paper is a new approach, called pepin, to approximate #DNF counting that achieves (nearly) optimal time complexity and outperforms 
existing FPRAS. Our approach is based on the recent breakthrough in the context of union of 
sets in streaming. We demonstrate the effectiveness of our approach through extensive experiments and show that it provides an affirmative answer to the challenge of efficiently computing #DNF.
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Solvers and tools 2529
Efficient Online Decision Tree Learning with Active Feature Acquisition
[+] More 
[-] Less 
Constructing decision trees online is a classical machine learning problem. Existing works often assume that features are readily available for each incoming data point. However, in many real world applications, both feature values and the labels are unknown a priori and can only be obtained at a cost. For example, in medical diagnosis, doctors have to choose which tests to perform (i.e., making costly feature queries) on a patient in order to make a diagnosis decision (i.e., predicting labels). We provide a fresh perspective to tackle this practical challenge. Our framework consists of an active planning oracle embedded in an online learning scheme for which we investigate several information acquisition functions. Specifically, we employ a surrogate information acquisition function based on adaptive submodularity to actively query feature values with a minimal cost, while using a posterior sampling scheme to maintain a low regret for online prediction. We demonstrate the efficiency and effectiveness of our framework via extensive experiments on various real-world datasets. Our framework also naturally adapts to the challenging setting of online learning with concept drift and is shown to be competitive with baseline models while being more flexible.
Data Mining -> DM: Mining data streams
Machine Learning -> ML: Active learning
List of keywords 
Machine Learning -> ML: Online learning Data Mining -> DM: Mining data streams
Machine Learning -> ML: Active learning
2543
HOI-aware Adaptive Network for Weakly-supervised Action Segmentation
[+] More 
[-] Less 
In this paper, we propose an HOI-aware adaptive network named AdaAct for weakly-supervised action segmentation. Most existing methods learn a fixed network to predict the action of each frame with the neighboring frames. However, this would result in ambiguity when estimating similar actions, such as pouring juice and pouring coffee. To address this, we aim to exploit temporally global but spatially local human-object interactions (HOI) as video-level prior knowledge for action segmentation. The long-term HOI sequence provides crucial contextual information to distinguish ambiguous actions, where our network dynamically adapts to the given HOI sequence at test time. More specifically, we first design a video HOI encoder that extracts, selects, and integrates the most representative HOI throughout the video. Then, we propose a two-branch HyperNetwork to learn an adaptive temporal encoder, which automatically adjusts the parameters based on the HOI information of various videos on the fly. Extensive experiments on two widely-used datasets including Breakfast and 50Salads demonstrate the effectiveness of our method under different evaluation metrics.
Computer Vision -> CV: Action and behavior recognition
List of keywords 
Computer Vision -> CV: Video analysis and understanding    Computer Vision -> CV: Action and behavior recognition
2554
GeNAS: Neural Architecture Search with Better Generalization
[+] More 
[-] Less 
Neural Architecture Search (NAS) aims to automatically excavate the optimal network architecture with superior test performance. Recent neural architecture search (NAS) approaches rely on validation loss or accuracy to find the superior network for the target data. In this paper, we investigate a new neural architecture search measure for excavating architectures with better generalization. We demonstrate that the flatness of the loss surface can be a promising proxy for predicting the generalization capability of neural network architectures. We evaluate our proposed method on various search spaces, showing similar or even better performance compared to the state-of-the-art NAS methods. Notably, the resultant architecture found by flatness measure generalizes robustly to various shifts in data distribution (e.g. ImageNet-V2,-A,-O), as well as various tasks such as object detection and semantic segmentation.
Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
List of keywords 
Computer Vision -> CV: Machine learning for vision Computer Vision -> CV: Recognition (object detection, categorization)
Computer Vision -> CV: Segmentation
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
2562
Explainable Text Classification via Attentive and Targeted Mixing Data Augmentation
[+] More 
[-] Less 
Mixing data augmentation methods have been widely used in text classification recently. However, existing methods do not control the quality of augmented data and have low model explainability. To tackle these issues, this paper proposes an explainable text classification solution based on attentive and targeted mixing data augmentation, ATMIX. Instead of selecting data for augmentation without control, ATMIX focuses on the misclassified training samples as the target for augmentation to better improve the model’s capability. Meanwhile, to generate meaningful augmented samples, it adopts a self-attention mechanism to understand the importance of the subsentences in a text, and cut and mix the subsentences between the misclassified and correctly classified samples wisely. Furthermore, it employs a novel dynamic augmented data selection framework based on the loss function gradient to dynamically optimize the augmented samples for model training. In the end, we develop a new model explainability evaluation method based on subsentence attention and conduct extensive evaluations over multiple real-world text datasets. The results indicate that ATMIX is more effective with higher explainability than the typical classification models, hidden-level, and input-level mixup models.
Natural Language Processing -> NLP: Sentiment analysis, stylistic analysis, and argument mining
Natural Language Processing -> NLP: Tools
List of keywords 
Natural Language Processing -> NLP: Text classification Natural Language Processing -> NLP: Sentiment analysis, stylistic analysis, and argument mining
Natural Language Processing -> NLP: Tools
2576
Synthesizing Resilient Strategies for Infinite-Horizon Objectives in Multi-Agent Systems
[+] More 
[-] Less 
We consider the problem of synthesizing resilient and stochastically stable strategies for systems of cooperating agents striving to minimize the expected time between consecutive visits to selected locations in a known environment. A strategy profile is resilient if it retains its functionality even if some of the agents fail, and stochastically stable if the visiting time variance is small. We design a novel specification language for objectives involving resilience and stochastic stability, and we show how to efficiently compute strategy profiles (for both autonomous and coordinated agents) optimizing these objectives. Our experiments show that our strategy synthesis algorithm can construct highly non-trivial and efficient strategy profiles for environments with general topology.
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Planning and Scheduling -> PS: Robot planning
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Planning and Scheduling -> PS: Robot planning
2577
An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations
[+] More 
[-] Less 
Using reinforcement learning for automated theorem proving has recently received much attention. Current approaches use representations of logical statements that often rely on the names used in these statements and, as a result, the models are generally not transferable from one domain to another. The size of these representations and whether to include the whole theory or part of it are other important decisions that affect the performance of these approaches as well as their runtime efficiency. In this paper, we present NIAGRA; an ensemble Name InvAriant Graph RepresentAtion. NIAGRA addresses this problem by using 1) improved Graph Neural Networks for learning name-invariant formula representations that is tailored for their unique characteristics and 2) an efficient ensemble approach for automated theorem proving. Our experimental evaluation shows state-of-the-art performance on multiple datasets from different domains with improvements up to 10% compared to the best learning-based approaches. Furthermore, transfer learning experiments show that our approach significantly outperforms other learning-based approaches by up to 28%.
Machine Learning -> ML: Applications
Machine Learning -> ML: Representation learning
List of keywords 
Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving Machine Learning -> ML: Applications
Machine Learning -> ML: Representation learning
2587
Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening
[+] More 
[-] Less 
Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image. Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency. For one thing, the universally adopted black box principle limits the model interpretability. For another thing, existing DL-based methods fail to efficiently capture local and global dependencies at the same time, inevitably limiting the overall performance. To address these mentioned issues, we first formulate the degradation process of the high-resolution multispectral (HrMS) image as a unified variational optimization problem, and alternately solve its data and prior subproblems by the designed iterative proximal gradient descent (PGD) algorithm. Moreover, we customize a Local-Global Transformer (LGT) to simultaneously model local and global dependencies, and further formulate an LGT-based prior module for image denoising. Besides the prior module, we also design a lightweight data module. Finally, by serially integrating the data and prior modules in each iterative stage, we unfold the iterative algorithm into a stage-wise unfolding network, Local-Global Transformer Enhanced Unfolding Network (LGTEUN), for the interpretable MS pan-sharpening. Comprehensive experimental results on three satellite data sets demonstrate the effectiveness and efficiency of LGTEUN compared with state-of-the-art (SOTA) methods. The source code is available at https://github.com/lms-07/LGTEUN.
Machine Learning -> ML: Applications
List of keywords 
Computer Vision -> CV: Applications Machine Learning -> ML: Applications
2590
DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning
[+] More 
[-] Less 
Communication lays the foundation for cooperation in human society and in multi-agent reinforcement learning (MARL). Humans also desire to maintain their privacy when communicating with others, yet such privacy concern has not been considered in existing works in MARL. We propose the differentially private multi-agent communication (DPMAC) algorithm, which protects the sensitive information of individual agents by equipping each agent with a local message sender with rigorous (epsilon, delta)-differential privacy (DP) guarantee. In contrast to directly perturbing the messages with predefined DP noise as commonly done in privacy-preserving scenarios, we adopt a stochastic message sender for each agent respectively and incorporate the DP requirement into the sender, which automatically adjusts the learned message distribution to alleviate the instability caused by DP noise. Further, we prove the existence of a Nash equilibrium in cooperative MARL with privacy-preserving communication, which suggests that this problem is game-theoretically learnable. Extensive experiments demonstrate a clear advantage of DPMAC over baseline methods in privacy-preserving scenarios.
Agent-based and Multi-agent Systems -> MAS: Agent communication
List of keywords 
Machine Learning -> ML: Deep reinforcement learning Agent-based and Multi-agent Systems -> MAS: Agent communication
2607
A Generalized Deep Markov Random Fields Framework for Fake News Detection
[+] More 
[-] Less 
Recently, the wanton dissemination of fake news on social media has adversely affected our lives, rendering automatic fake news detection a pressing issue. Current methods are often fully supervised and typically employ deep neural networks (DNN) to learn implicit relevance from labeled data, ignoring explicitly shared properties (e.g., inflammatory expressions) across fake news. To address this limitation, we propose a graph-theoretic framework, called Generalized Deep Markov Random Fields Framework (GDMRFF), that inherits the capability of deep learning while at the same time exploiting the correlations among the news articles (including labeled and unlabeled data). Specifically, we first leverage a DNN-based module to learn implicit relations, which we then reveal as the unary function of MRF. Pairwise functions with refining effects to encapsulate human insights are designed to capture the explicit association among all samples. Meanwhile, an event removal module is introduced to remove event impact on pairwise functions. Note that we train GDMRFF with the semi-supervised setting, which decreases the reliance on labeled data while maximizing the potential of unlabeled data. We further develop an Ambiguity Learning Guided MRF (ALGM) model as a concretization of GDMRFF.  Experiments show that ALGM outperforms the compared methods significantly on two datasets, especially when labeled data is limited.
Data Mining -> DM: Mining graphs
Natural Language Processing -> NLP: Text classification
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Web and social networks Data Mining -> DM: Mining graphs
Natural Language Processing -> NLP: Text classification
2611
RuleMatch: Matching Abstract Rules for Semi-supervised Learning of Human Standard Intelligence Tests
[+] More 
[-] Less 
Raven’s Progressive Matrices (RPM), one of the standard intelligence tests in human psychology, has recently emerged as a powerful tool for studying abstract visual reasoning (AVR) abilities in machines. Although existing computational models for RPM problems achieve good performance, they require a large number of labeled training examples for supervised learning. In contrast, humans can efficiently solve unlabeled RPM problems after learning from only a few example questions. Here, we develop a semi-supervised learning (SSL) method, called RuleMatch, to train deep models with a small number of labeled RPM questions along with other unlabeled questions. Moreover, instead of using pixel-level augmentation in object perception tasks, we exploit the nature of RPM problems and augment the data at the level of abstract rules. Specifically, we disrupt the possible rules contained among context images in an RPM question and force the two augmented variants of the same unlabeled sample to obey the same abstract rule and predict a common pseudo label for training. Extensive experiments show that the proposed RuleMatch achieves state-of-the-art performance on two popular RAVEN datasets. Our work makes an important stride in aligning abstract analogical visual reasoning abilities in machines and humans. Our Code is at https://github.com/ZjjConan/AVR-RuleMatch.
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
List of keywords 
Computer Vision -> CV: Visual reasoning and symbolic representation Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
2612
DFVSR: Directional Frequency Video Super-Resolution via Asymmetric and Enhancement Alignment Network
[+] More 
[-] Less 
Recently, techniques utilizing frequency-based methods have gained significant attention, as they exhibit exceptional restoration capabilities for detail and structure in video super-resolution tasks. However, most of these frequency-based methods mainly have three major limitations: 1) insufficient exploration of object motion information, 2) inadequate enhancement for high-fidelity regions, and 3) loss of spatial information during convolution. In this paper, we propose a novel network, Directional Frequency Video Super-Resolution (DFVSR), to address these limitations. Specifically,  we reconsider object motion from a new perspective and propose Directional Frequency Representation (DFR), which not only borrows the property of frequency representation of detail and structure information but also contains the direction information of the object motion that is extremely significant in videos. Based on this representation,  we propose a Directional Frequency-Enhanced Alignment (DFEA) to use double enhancements of task-related information for ensuring the retention of high-fidelity frequency regions to generate the high-quality alignment feature. Furthermore, we design a novel Asymmetrical U-shaped network architecture to progressively fuse these alignment features and output the final output. This architecture enables the intercommunication of the same level of resolution in the encoder and decoder to achieve the supplement of spatial information. Powered by the above designs, our method achieves superior performance over state-of-the-art models on both quantitative and qualitative evaluations.
Computer Vision -> CV: Other
List of keywords 
Computer Vision -> CV: Image and video retrieval  Computer Vision -> CV: Other
2614
SS-BSN: Attentive Blind-Spot Network for Self-Supervised Denoising with Nonlocal Self-Similarity
[+] More 
[-] Less 
Recently, numerous studies have been conducted on supervised learning-based image denoising methods. However, these methods rely on large-scale noisy-clean image pairs, which are difficult to obtain in practice. Denoising methods with self-supervised training that can be trained with only noisy images have been proposed to address the limitation. These methods are based on the convolutional neural network (CNN) and have shown promising performance. However, CNN-based methods do not consider using nonlocal self-similarities essential in the traditional method, which can cause performance limitations. This paper presents self-similarity attention (SS-Attention), a novel self-attention module that can capture nonlocal self-similarities to solve the problem. We focus on designing a lightweight self-attention module in a pixel-wise manner, which is nearly impossible to implement using the classic self-attention module due to the quadratically increasing complexity with spatial resolution. Furthermore, we integrate SS-Attention into the blind-spot network called self-similarity-based blind-spot network (SS-BSN). We conduct the experiments on real-world image denoising tasks. The proposed method quantitatively and qualitatively outperforms state-of-the-art methods in self-supervised denoising on the Smartphone Image Denoising Dataset (SIDD) and Darmstadt Noise Dataset (DND) benchmark datasets.
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Self-supervised Learning
List of keywords 
Computer Vision -> CV: Computational photography Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Self-supervised Learning
2619
StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset
[+] More 
[-] Less 
Modeling and capturing the 3D spatial arrangement of the human and the object is the key to perceiving 3D human-object interaction from monocular images. In this work, we propose to use the Human-Object Offset between anchors which are densely sampled from the surface of human mesh and object mesh to represent human-object spatial relation. Compared with previous works which use contact map or implicit distance filed to encode 3D human-object spatial relations, our method is a simple and efficient way to encode the highly detailed spatial correlation between the human and object. Based on this representation, we propose Stacked Normalizing Flow (StackFLOW) to infer the posterior distribution of human-object spatial relations from the image. During the optimization stage, we finetune the human body pose and object 6D pose by maximizing the likelihood of samples based on this posterior distribution and minimizing the 2D-3D corresponding reprojection loss. Extensive experimental results show that our method achieves impressive results on two challenging benchmarks, BEHAVE and InterCap datasets. Our code has been publicly available at https://github.com/MoChen-bop/StackFLOW.
Computer Vision -> CV: Action and behavior recognition
Computer Vision -> CV: Biometrics, face, gesture and pose recognition
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Action and behavior recognition
Computer Vision -> CV: Biometrics, face, gesture and pose recognition
2638
Prediction with Incomplete Data under Agnostic Mask Distribution Shift
[+] More 
[-] Less 
Data with missing values is ubiquitous in many applications. Recent years have witnessed increasing attention on prediction with only incomplete data consisting of observed features and a mask that indicates the missing pattern. Existing methods assume that the training and testing distributions are the same, which may be violated in real-world scenarios. In this paper, we consider prediction with incomplete data in the presence of distribution shift. We focus on the case where the underlying joint distribution of complete features and label is invariant, but the missing pattern, i.e., mask distribution may shift agnostically between training and testing. To achieve generalization, we leverage the observation that for each mask, there is an invariant optimal predictor. To avoid the exponential explosion when learning them separately, we approximate the optimal predictors jointly using a double parameterization technique. This has the undesirable side effect of allowing the learned predictors to rely on the intra-mask correlation and that between features and mask. We perform decorrelation to minimize this effect. Combining the techniques above, we propose a novel prediction method called StableMiss. Extensive experiments on both synthetic and real-world datasets show that StableMiss is robust and outperforms state-of-the-art methods under agnostic mask distribution shift.
List of keywords 
Machine Learning -> ML: Multi-task and transfer learning 2653
Decentralized Anomaly Detection in Cooperative Multi-Agent Reinforcement Learning
[+] More 
[-] Less 
We consider the problem of detecting adversarial attacks against cooperative multi-agent reinforcement learning. We propose a decentralized scheme that allows agents to detect the abnormal behavior of one compromised agent. Our approach is based on a recurrent neural network (RNN) trained during cooperative learning to predict the action distribution of other agents based on local observations. The predicted distribution is used for computing a normality score for the agents, which allows the detection of the misbehavior of other agents. To explore the robustness of the proposed detection scheme, we formulate the worst-case attack against our scheme as a constrained reinforcement learning problem. We propose to compute an attack policy by optimizing the corresponding dual function using reinforcement learning. Extensive simulations on various multi-agent benchmarks show the effectiveness of the proposed detection scheme in detecting state-of-the-art attacks and in limiting the impact of undetectable attacks.
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
2666
Deliberation and Voting in Approval-Based Multi-Winner Elections
[+] More 
[-] Less 
Citizen-focused democratic processes where participants deliberate on alternatives and then vote to make the final decision are increasingly popular today. While the computational social choice literature has extensively investigated voting rules, there is limited work that explicitly looks at the interplay of the deliberative process and voting. In this paper, we build a deliberation model using established models from the opinion-dynamics literature and study the effect of different deliberation mechanisms on voting outcomes achieved when using well-studied voting rules. Our results show that deliberation generally improves welfare and representation guarantees, but the results are sensitive to how the deliberation process is organized. We also show, experimentally, that simple voting rules, such as approval voting, perform as well as more sophisticated rules such as proportional approval voting or method of equal shares if deliberation is properly supported. This has ramifications on the practical use of such voting rules in citizen-focused democratic processes.
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
2670
FedPass: Privacy-Preserving Vertical Federated Deep Learning with Adaptive Obfuscation
[+] More 
[-] Less 
Vertical federated learning (VFL) allows an active party with labeled data to leverage auxiliary features from the passive parties to improve model performance. Concerns about the private feature and label leakage in both the training and inference phases of VFL have drawn wide research attention. In this paper, we propose a general privacy-preserving vertical federated deep learning framework called FedPass, which leverages adaptive obfuscation to protect the feature and label simultaneously.  Strong privacy-preserving capabilities about private features and labels are theoretically proved (in Theorems 1 and 2). 
Extensive experimental results with different datasets and network architectures also justify the superiority of FedPass against existing methods in light of its near-optimal trade-off between privacy and model performance.
Computer Vision -> CV: Bias, fairness and privacy
List of keywords 
Machine Learning -> ML: Federated learning Computer Vision -> CV: Bias, fairness and privacy
2671
Contrastive Label Enhancement
[+] More 
[-] Less 
Label distribution learning (LDL) is a new machine learning paradigm for solving label ambiguity. Since it is difficult to directly obtain label distributions, many studies are focusing on how to recover label distributions from logical labels, dubbed label enhancement (LE). Existing LE methods estimate label distributions by simply building a mapping relationship between features and label distributions under the supervision of logical labels. They typically overlook the fact that both features and logical labels are descriptions of the instance from different views. Therefore, we propose a novel method called Contrastive Label Enhancement (ConLE) which integrates features and logical labels into the unified projection space to generate high-level features by contrastive learning strategy. In this approach, features and logical labels belonging to the same sample are pulled closer, while those of different samples are projected farther away from each other in the projection space. Subsequently, we leverage the obtained high-level features to gain label distributions through a well-designed training strategy that considers the consistency of label attributes. Extensive experiments on LDL benchmark datasets demonstrate the effectiveness and superiority of our method.
Machine Learning -> ML: Multi-view learning
List of keywords 
Machine Learning -> ML: Multi-label Machine Learning -> ML: Multi-view learning
2676
SAT-Based PAC Learning of Description Logic Concepts
[+] More 
[-] Less 
We propose bounded fitting as a scheme for learning
description logic concepts in the presence of ontologies. A main
advantage is that the resulting learning algorithms come with
theoretical guarantees regarding their generalization to unseen
examples in the sense of PAC learning. We prove that, in contrast,
several other natural learning algorithms fail to provide such
guarantees. As a further contribution, we present the system SPELL
which efficiently implements bounded fitting for the description
logic ELHr based on a SAT solver, and compare its performance to a
state-of-the-art learner.
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
List of keywords 
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies Knowledge Representation and Reasoning -> KRR: Learning and reasoning
2692
A Novel Demand Response Model and Method for Peak Reduction in Smart Grids — PowerTAC
[+] More 
[-] Less 
One of the widely used peak reduction methods in smart grids is demand response, where one analyzes the shift in customers’ (agents’) usage patterns in response to the signal from the distribution company. Often, these signals are in the form of incentives offered to agents. This work studies the effect of incentives on the probabilities of accepting such offers in a real-world smart grid simulator, PowerTAC. We first show that there exists a function that depicts the probability of an agent reducing its load as a function of the discounts offered to them. We call it reduction probability (RP). RP  function is further parametrized by the rate of reduction (RR), which can differ for each agent. We provide an optimal algorithm, MJS–ExpResponse, that outputs the discounts to each agent by maximizing the expected reduction under a budget constraint. When RRs are unknown, we propose a Multi-Armed Bandit (MAB) based online algorithm, namely MJSUCB–ExpResponse, to learn RRs. Experimentally we show that it exhibits sublinear regret. Finally, we showcase the efficacy of the proposed algorithm in mitigating demand peaks in a real-world smart grid system using the PowerTAC simulator as a test bed.
Multidisciplinary Topics and Applications -> MDA: Energy, environment and sustainability
List of keywords 
Machine Learning -> ML: Applications Multidisciplinary Topics and Applications -> MDA: Energy, environment and sustainability
2693
MolHF: A Hierarchical Normalizing Flow for Molecular Graph Generation
[+] More 
[-] Less 
Molecular de novo design is a critical yet challenging task in scientific fields, aiming to design novel molecular structures with desired property profiles. Significant progress has been made by resorting to generative models for graphs. However, limited attention is paid to hierarchical generative models, which can exploit the inherent hierarchical structure (with rich semantic information) of the molecular graphs and generate complex molecules of larger size that we shall demonstrate to be difficult for most existing models. The primary challenge to hierarchical generation is the non-differentiable issue caused by the generation of intermediate discrete coarsened graph structures. To sidestep this issue, we cast the tricky hierarchical generation problem over discrete spaces as the reverse process of hierarchical representation learning and propose MolHF, a new hierarchical flow-based model that generates molecular graphs in a coarse-to-fine manner. Specifically, MolHF first generates bonds through a multi-scale architecture, then generates atoms based on the coarsened graph structure at each scale. We demonstrate that MolHF achieves state-of-the-art performance in random generation and property optimization, implying its high capacity to model data distribution. Furthermore, MolHF is the first flow-based model that can be applied to model larger molecules (polymer) with more than 100 heavy atoms. The code and models are available at https://github.com/violet-sto/MolHF.
Machine Learning -> ML: Probabilistic machine learning
Machine Learning -> ML: Sequence and graph learning
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Health and medicine Machine Learning -> ML: Probabilistic machine learning
Machine Learning -> ML: Sequence and graph learning
2705
An Empirical Study on the Language Modal in Visual Question Answering
[+] More 
[-] Less 
Generalization beyond in-domain experience to out-of-distribution data is of paramount significance in the AI domain. Of late, state-of-the-art Visual Question Answering (VQA) models have shown impressive performance on in-domain data, partially due to the language prior bias which, however, hinders the generalization ability in practice. This paper attempts to provide new insights into the influence of language modality on VQA performance from an empirical study perspective. To achieve this, we conducted a series of experiments on six models. The results of these experiments revealed that, 1) apart from prior bias caused by question types, there is a notable influence of postfix-related bias in inducing biases, and 2) training VQA models with word-sequence-related variant questions demonstrated improved performance on the out-of-distribution benchmark, and the LXMERT even achieved a 10-point gain without adopting any debiasing methods. We delved into the underlying reasons behind these experimental results and put forward some simple proposals to reduce the models’ dependency on language priors. The experimental results demonstrated the effectiveness of our proposed method in improving performance on the out-of-distribution benchmark, VQA-CPv2.  We hope this study can inspire novel insights for future research on designing bias-reduction approaches.
Computer Vision -> CV: Vision and language
Natural Language Processing -> NLP: Question answering
List of keywords 
Machine Learning -> ML: Multi-modal learning Computer Vision -> CV: Vision and language
Natural Language Processing -> NLP: Question answering
2716
LSGNN: Towards General Graph Neural Network in Node Classification by Local Similarity
[+] More 
[-] Less 
Heterophily has been considered as an issue that hurts the performance of Graph Neural Networks (GNNs). To address this issue, some existing work uses a graph-level weighted fusion of the information of multi-hop neighbors to include more nodes with homophily. However, the heterophily might differ among nodes, which requires to consider the local topology. Motivated by it, we propose to use the local similarity (LocalSim) to learn node-level weighted fusion, which can also serve as a plug-and-play module. For better fusion, we propose a novel and efficient Initial Residual Difference Connection (IRDC) to extract more informative multi-hop information. Moreover, we provide theoretical analysis on the effectiveness of LocalSim representing node homophily on synthetic graphs. Extensive evaluations over real benchmark datasets show that our proposed method, namely Local Similarity Graph Neural Network (LSGNN), can offer comparable or superior state-of-the-art performance on both homophilic and heterophilic graphs. Meanwhile, the plug-and-play model can significantly boost the performance of existing GNNs.
Data Mining -> DM: Mining graphs
List of keywords 
Machine Learning -> ML: Sequence and graph learning Data Mining -> DM: Mining graphs
2727
Hierarchical Semantic Contrast for Weakly Supervised Semantic Segmentation
[+] More 
[-] Less 
Weakly supervised semantic segmentation (WSSS) with image-level annotations has achieved great processes through class activation map (CAM). Since vanilla CAMs are hardly served as guidance to bridge the gap between full and weak supervision, recent studies explore semantic representations to make CAM fit for WSSS and demonstrate encouraging results. However, they generally exploit single-level semantics, which may hamper the model to learn a comprehensive semantic structure. Motivated by the prior that each image has multiple levels of semantics, we propose hierarchical semantic contrast (HSC) to ameliorate the above problem. It conducts semantic contrast from coarse-grained to fine-grained perspective, including ROI level, class level, and pixel level, making the model learn a better object pattern understanding. To further improve CAM quality, building upon HSC, we explore consistency regularization of cross supervision and develop momentum prototype learning to utilize abundant semantics across different images. Extensive studies manifest that our plug-and-play learning paradigm, HSC, can significantly boost CAM quality on both non-saliency-guided and saliency-guided baselines, and establish new state-of-the-art WSSS performance on PASCAL VOC 2012 dataset. Code is available at https://github.com/Wu0409/HSC_WSSS.
Computer Vision -> CV: Representation learning
Computer Vision -> CV: Scene analysis and understanding
List of keywords 
Computer Vision -> CV: Segmentation Computer Vision -> CV: Representation learning
Computer Vision -> CV: Scene analysis and understanding
2734
Open Anomalous Trajectory Recognition via Probabilistic Metric Learning
[+] More 
[-] Less 
Typically, trajectories considered anomalous are the ones deviating from usual (e.g., traffic-dictated) driving patterns. However, this closed-set context fails to recognize the unknown anomalous trajectories, resulting in an insufficient self-motivated learning paradigm. In this study, we investigate the novel Anomalous Trajectory Recognition problem in an Open-world scenario (ATRO) and introduce a novel probabilistic Metric learning model, namely ATROM, to address it. Specifically, ATROM can detect the presence of unknown anomalous behavior in addition to identifying known behavior. It has a Mutual Interaction Distillation that uses contrastive metric learning to explore the interactive semantics regarding the diverse behavioral intents and a Probabilistic Trajectory Embedding that forces the trajectories with distinct behaviors to follow different Gaussian priors. More importantly, ATROM offers a probabilistic metric rule to discriminate between known and unknown behavioral patterns by taking advantage of the approximation of multiple priors. Experimental results on two large-scale trajectory datasets demonstrate the superiority of ATROM in addressing both known and unknown anomalous patterns.
Data Mining -> DM: Applications
Multidisciplinary Topics and Applications -> MDA: Transportation
List of keywords 
Data Mining -> DM: Mining spatial and/or temporal data Data Mining -> DM: Applications
Multidisciplinary Topics and Applications -> MDA: Transportation
2738
Boosting Decision-Based Black-Box Adversarial Attack with Gradient Priors
[+] More 
[-] Less 
Decision-based methods have shown to be effective in black-box adversarial attacks, as they can obtain satisfactory performance and only require to access the final model prediction. Gradient estimation is a critical step in black-box adversarial attacks, as it will directly affect the query efficiency. Recent works have attempted to utilize gradient priors to facilitate score-based methods to obtain better results. However, these gradient priors still suffer from the edge gradient discrepancy issue and the successive iteration gradient direction issue, thus are difficult to simply extend to decision-based methods. In this paper, we propose a novel Decision-based Black-box Attack framework with Gradient Priors (DBA-GP), which seamlessly integrates the data-dependent gradient prior and time-dependent prior into the gradient estimation procedure. First, by leveraging the joint bilateral filter to deal with each random perturbation, DBA-GP can guarantee that the generated perturbations in edge locations are hardly smoothed, i.e., alleviating the edge gradient discrepancy, thus remaining the characteristics of the original image as much as possible. Second, by utilizing a new gradient updating strategy to automatically adjust the successive iteration gradient direction, DBA-GP can accelerate the convergence speed, thus improving the query efficiency. Extensive experiments have demonstrated that the proposed method outperforms other strong baselines significantly.
List of keywords 
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods 2748
VS-Boost: Boosting Visual-Semantic Association for Generalized  Zero-Shot Learning
[+] More 
[-] Less 
Unlike conventional zero-shot learning (CZSL) which only focuses on the recognition of unseen classes by using the classifier trained on seen classes and semantic embeddings, generalized zero-shot learning (GZSL) aims at recognizing both the seen and unseen classes, so it is more challenging due to the extreme training imbalance. Recently, some feature generation methods introduce metric learning to enhance the discriminability of visual features. Although these methods achieve good results, they focus only on metric learning in the visual feature space to enhance features and ignore the association between the feature space and the semantic space. Since the GZSL method uses semantics as prior knowledge to migrate visual knowledge to unseen classes, the consistency between visual space and semantic space is critical. To this end, we propose relational metric learning which can relate the metrics in the two spaces and make the distribution of the two spaces more consistent. Based on the generation method and relational metric learning, we proposed a novel GZSL method, termed VS-Boost, which can effectively boost the association between vision and semantics. The experimental results demonstrate that our method is effective and achieves significant gains on five benchmark datasets compared with the state-of-the-art methods.
Computer Vision -> CV: Neural generative models, auto encoders, GANs
Computer Vision -> CV: Recognition (object detection, categorization)
List of keywords 
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning    Computer Vision -> CV: Neural generative models, auto encoders, GANs
Computer Vision -> CV: Recognition (object detection, categorization)
2753
Low-Confidence Samples Mining for Semi-supervised Object Detection
[+] More 
[-] Less 
Reliable pseudo labels from unlabeled data play a key role in semi-supervised object detection (SSOD). However, the state-of-the-art SSOD methods all rely on pseudo labels with high confidence, which ignore valuable pseudo labels with lower confidence. Additionally, the insufficient excavation for unlabeled data results in an excessively low recall rate thus hurting the network training. In this paper, we propose a novel Low-confidence Samples Mining (LSM) method to utilize low confidence pseudo labels efficiently. Specifically, we develop an additional pseudo information mining (PIM) branch on account of low-resolution feature maps to extract reliable large area instances, the IoUs of which are higher than small area ones. Owing to the complementary predictions between PIM and the main branch, we further design self-distillation (SD) to compensate for both in a mutually learning manner. Meanwhile, the extensibility of the above approaches enables our LSM to apply to Faster-RCNN and Deformable-DETR respectively. On the MS-COCO benchmark, our method achieves 3.54% mAP improvement over state-of-the-art methods under 5% labeling ratios.
Data Mining -> DM: Applications
Data Mining -> DM: Exploratory data mining
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Data Mining -> DM: Applications
Data Mining -> DM: Exploratory data mining
2754
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
[+] More 
[-] Less 
Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the distributed training of ultra-large models. However, directly deploying these systems often leads to sub-optimal training efficiency due to the complex model architectures and the strict device memory constraints. In this paper, we propose Optimal Sharded Data Parallel (OSDP), an automated parallel training system that combines the advantages from both data and model parallelism. Given the model description and the device information, OSDP makes trade-offs between the memory consumption and the hardware utilization, thus automatically generates the distributed computation graph and maximizes the overall system throughput. In addition, OSDP introduces operator splitting to further alleviate peak memory footprints during training with negligible overheads, which enables the trainability of larger models as well as the higher throughput. Extensive experimental results of OSDP on multiple different kinds of large-scale models demonstrate that the proposed strategy outperforms the state-of-the-art in multiple regards.
Data Mining -> DM: Big data and scalability
List of keywords 
Data Mining -> DM: Parallel, distributed and cloud-based high performance mining Data Mining -> DM: Big data and scalability
2757
Bayesian Optimization with Switching Cost: Regret Analysis and Lookahead Variants
[+] More 
[-] Less 
Bayesian Optimization (BO) has recently received increasing attention due to its efficiency in optimizing expensive-to-evaluate functions.  For some practical problems, it is essential to consider the path-dependent switching cost between consecutive sampling locations given a total traveling budget. For example, when using a drone to locate cracks in a building wall or search for lost survivors in the wild, the search path needs to be efficiently planned given the limited battery power of the drone. Tackling such problems requires a careful cost-benefit analysis of candidate locations and balancing exploration and exploitation.  In this work, we formulate such a problem as a constrained Markov Decision Process (MDP) and solve it by proposing a new distance-adjusted multi-step look-ahead acquisition function, the distUCB, and using rollout approximation. We also provide a theoretical regret analysis of the distUCB-based Bayesian optimization algorithm. In addition, the empirical performance of the proposed algorithm is tested based on both synthetic and real data experiments, and it shows that our cost-aware non-myopic algorithm performs better than other popular alternatives.
Machine Learning -> ML: Hyperparameter optimization
List of keywords 
Machine Learning -> ML: Bayesian learning Machine Learning -> ML: Hyperparameter optimization
2758
Learning Survival Distribution with Implicit Survival Function
[+] More 
[-] Less 
Survival analysis aims at modeling the relationship between covariates and event occurrence with some untracked (censored) samples. In implementation, existing methods model the survival distribution with strong assumptions or in a discrete time space for likelihood estimation with censorship, which leads to weak generalization. In this paper, we propose Implicit Survival Function (ISF) based on Implicit Neural Representation for survival distribution estimation without strong assumptions, and employ numerical integration to approximate the cumulative distribution function for prediction and optimization. Experimental results show that ISF outperforms the state-of-the-art methods in three public datasets and has robustness to the hyperparameter controlling estimation precision.
List of keywords 
Machine Learning -> ML: Other 2759
Deep Symbolic Learning: Discovering Symbols and Rules from Perceptions
[+] More 
[-] Less 
Neuro-Symbolic (NeSy) integration combines symbolic reasoning with Neural Networks (NNs) for tasks requiring perception and reasoning. Most NeSy systems rely on continuous relaxation of logical knowledge, and no discrete decisions are made within the model pipeline. Furthermore, these methods assume that the symbolic rules are given. In this paper, we propose Deep Symboilic Learning (DSL), a NeSy system that learns NeSy-functions, i.e., the composition of a (set of) perception functions which map continuous data to discrete symbols, and a symbolic function over the set of symbols. DSL simultaneously learns the perception and symbolic functions while being trained only on their composition (NeSy-function). The key novelty of DSL is that it can create internal (interpretable) symbolic representations and map them to perception inputs within a differentiable NN learning pipeline. The created symbols are automatically selected to generate symbolic functions that best explain the data. We provide experimental analysis to substantiate the efficacy of DSL  in simultaneously learning perception and symbolic functions.
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Explainable/Interpretable machine learning
List of keywords 
Machine Learning -> ML: Neuro-symbolic methods Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Explainable/Interpretable machine learning
2774
Approximating Fair Division on D-Claw-Free Graphs
[+] More 
[-] Less 
We study the problem of fair allocation of indivisible goods that form a graph and the bundles that are distributed to agents are connected subgraphs of this graph. We focus on the maximin share and the proportional fairness criteria. It is well-known that allocations satisfying these criteria may not exist for many graphs including complete graphs and cycles. Therefore, it is natural to look for approximate allocations, i.e., allocations guaranteeing each agent a certain portion of the value that is satisfactory to her. In this paper we consider the class of graphs of goods which do not contain a star with d+1 edges (where d > 1) as an induced subgraph. For this class of graphs we prove that there is an allocation assigning each agent a connected bundle of value at least 1/d of her maximin share. Moreover, for the same class of graphs of goods, we show a theorem which specifies what fraction of the proportional share can be guaranteed to each agent if the values of single goods for the agents are bounded by a given fraction of this share.
Agent-based and Multi-agent Systems -> MAS: Resource allocation
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division Agent-based and Multi-agent Systems -> MAS: Resource allocation
2777
Learning Gaussian Mixture Representations for Tensor Time Series Forecasting
[+] More 
[-] Less 
Tensor time series (TTS) data, a generalization of one-dimensional time series on a high-dimensional space, is ubiquitous in real-world scenarios, especially in monitoring systems involving multi-source spatio-temporal data (e.g., transportation demands and air pollutants). Compared to modeling time series or multivariate time series, which has received much attention and achieved tremendous progress in recent years, tensor time series has been paid less effort. Properly coping with the tensor time series is a much more challenging task, due to its high-dimensional and complex inner structure. In this paper, we develop a novel TTS forecasting framework, which seeks to individually model each heterogeneity component implied in the time, the location, and the source variables. We name this framework as GMRL, short for Gaussian Mixture Representation Learning. Experiment results on two real-world TTS datasets verify the superiority of our approach compared with the state-of-the-art baselines. Code and data are published on https://github.com/beginner-sketch/GMRL.
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
Machine Learning -> ML: Time series and data streams
List of keywords 
Data Mining -> DM: Mining spatial and/or temporal data Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
Machine Learning -> ML: Time series and data streams
2780
Globally Consistent Federated Graph Autoencoder for Non-IID Graphs
[+] More 
[-] Less 
Graph neural networks (GNNs) have been applied successfully in many machine learning tasks due to their advantages in utilizing neighboring information. Recently, with the global enactment of privacy protection regulations, federated GNNs have gained increasing attention in academia and industry. However, the graphs owned by different participants could be non-independently-and-identically distributed (non-IID), leading to the deterioration of federated GNNs’ accuracy. In this paper, we propose a globally consistent federated graph autoencoder (GCFGAE) to overcome the non-IID problem in unsupervised federated graph learning via three innovations. First, by integrating federated learning with split learning, we train a unique global model instead of FedAvg-styled global and local models, yielding results consistent with that of the centralized GAE. Second, we design a collaborative computation mechanism considering overlapping vertices to reduce communication overhead during forward propagation. Third, we develop a layer-wise and block-wise gradient computation strategy to reduce the space and communication complexity during backward propagation. Experiments on real-world datasets demonstrate that GCFGAE achieves not only higher accuracy but also around 500 times lower communication overhead and 1000 times smaller space overhead than existing federated GNN models.
Data Mining -> DM: Mining graphs
List of keywords 
Machine Learning -> ML: Federated learning Data Mining -> DM: Mining graphs
2788
Video Object Segmentation in Panoptic Wild Scenes
[+] More 
[-] Less 
In this paper, we introduce semi-supervised video object segmentation (VOS) to panoptic wild scenes and present a large-scale benchmark as well as a baseline method for it. Previous benchmarks for VOS with sparse annotations are not sufficient to train or evaluate a model that needs to process all possible objects in real-world scenarios. Our new benchmark (VIPOSeg) contains exhaustive object annotations and covers various real-world object categories which are carefully divided into subsets of thing/stuff and seen/unseen classes for comprehensive evaluation. Considering the challenges in panoptic VOS, we propose a strong baseline method named panoptic object association with transformers (PAOT), which associates multiple objects by panoptic identification in a pyramid architecture on multiple scales. Experimental results show that VIPOSeg can not only boost the performance of VOS models by panoptic training but also evaluate them comprehensively in panoptic scenes. Previous methods for classic VOS still need to improve in performance and efficiency when dealing with panoptic scenes, while our PAOT achieves SOTA performance with good efficiency on VIPOSeg and previous VOS benchmarks. PAOT also ranks 1st in the VOT2022 challenge. Our dataset and code are available at https://github.com/yoxu515/VIPOSeg-Benchmark.
Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Video analysis and understanding
List of keywords 
Computer Vision -> CV: Segmentation Computer Vision -> CV: Motion and tracking
Computer Vision -> CV: Video analysis and understanding
2789
One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER
[+] More 
[-] Less 
Cross-domain NER is a challenging task to address the low-resource problem in practical scenarios. Previous typical solutions mainly obtain a NER model by pre-trained language models (PLMs) with data from a rich-resource domain and adapt it to the target domain. Owing to the mismatch issue among entity types in different domains, previous approaches normally tune all parameters of PLMs, ending up with an entirely new NER model for each domain. Moreover, current models only focus on leveraging knowledge in one general source domain while failing to successfully transfer knowledge from multiple sources to the target. To address these issues, we introduce Collaborative Domain-Prefix Tuning for cross-domain NER (CP-NER) based on text-to-text generative PLMs. Specifically, we present text-to-text generation grounding domain-related instructors to transfer knowledge to new domain NER tasks without structural modifications. We utilize frozen PLMs and conduct collaborative domain-prefix tuning to stimulate the potential of PLMs to handle NER tasks across various domains. Experimental results on the Cross-NER benchmark show that the proposed approach has flexible transfer ability and performs better on both one-source and multiple-source cross-domain NER tasks.
Natural Language Processing -> NLP: Named entities
List of keywords 
Natural Language Processing -> NLP: Information extraction Natural Language Processing -> NLP: Named entities
2793
Revisiting the Evaluation of Deep Learning-Based Compiler Testing
[+] More 
[-] Less 
A high-quality program generator is essential to effective automated compiler testing. Engineering such a program generator is difficult, time-consuming, and specific to the language under testing, thus requiring tremendous efforts from human experts with language-specific domain knowledge. To avoid repeatedly writing program generators for different languages, researchers recently proposed a language-agnostic approach based on deep learning techniques to automatically learn a program generator (referred to as DLG) from existing programs. Evaluations show that DLGs outperform Language-Specific Program Generators (LSGs) in testing compilers.
However, we argue that it is unfair to use LSGs as baselines to evaluate DLGs. LSGs aim to validate compiler optimizations by only generating compilable, well-defined test programs; this restriction inevitably impairs the diversity of the language features used in the generated programs. In contrast, DLGs do not aim to validate the correctness of compiler optimizations, and its generated programs are not guaranteed to be well-defined or even compilable. Therefore, it is not surprising that DLG-generated programs are more diverse in terms of used language features than LSG-generated ones. 
This study revisits the evaluation of DLGs, and proposes a new, fair, simple yet strong baseline named Kitten for evaluating DLGs. Given a dataset consisting of human-written programs, instead of using deep learning techniques to learn a program generator, Kitten directly derives new programs by mutating the programs in the dataset. Extensive experiments with more than 1,500 CPU-hours demonstrate that the state-of-the-art DLGs fail to compete against such a simple baseline: 3 v.s. 1,750 hang bugs, 1 v.s. 34 distinct compiler crashes. We believe that DLGs still have a large room for improvement.
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Software engineering 2815
Pyramid Diffusion Models for Low-light Image Enhancement
[+] More 
[-] Less 
Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement. Recent diffusion models show realistic and detailed image generation through a sequence of denoising refinements and motivate us to introduce them to low-light image enhancement for recovering realistic details. However, we found two problems when doing this, i.e., 1) diffusion models keep constant resolution in one reverse process, which limits the speed; 2) diffusion models sometimes result in global degradation (e.g., RGB shift). To address the above problems, this paper proposes a Pyramid Diffusion model (PyDiff) for low-light image enhancement. PyDiff uses a novel pyramid diffusion method to perform sampling in a pyramid resolution style (i.e., progressively increasing resolution in one reverse process). Pyramid diffusion makes PyDiff much faster than vanilla diffusion models and introduces no performance degradation. Furthermore, PyDiff uses a global corrector to alleviate the global degradation that may occur in the reverse process, significantly improving the performance and making the training of diffusion models easier with little additional computational consumption. Extensive experiments on popular benchmarks show that PyDiff achieves superior performance and efficiency. Moreover, PyDiff can generalize well to unseen noise and illumination distributions. Code and supplementary materials are available at https://github.com/limuloo/PyDIff.git.
Computer Vision -> CV: Neural generative models, auto encoders, GANs
List of keywords 
Computer Vision -> CV: Computational photography Computer Vision -> CV: Neural generative models, auto encoders, GANs
2816
Genetic Prompt Search via Exploiting Language Model Probabilities
[+] More 
[-] Less 
Prompt tuning for large-scale pretrained language models (PLMs) has shown remarkable potential, especially in low-resource scenarios such as few-shot learning. Moreover, derivative-free optimisation (DFO) techniques make it possible to tune prompts for a black-box PLM to better fit downstream tasks. However, there are usually preconditions to apply existing DFO-based prompt tuning methods, e.g. the backbone PLM needs to provide extra APIs so that hidden states (and/or embedding vectors) can be injected into it as continuous prompts, or carefully designed (discrete) manual prompts need to be available beforehand, serving as the initial states of the tuning algorithm. To waive such preconditions and make DFO-based prompt tuning ready for general use, this paper introduces a novel genetic algorithm (GA) that evolves from empty prompts, and uses the predictive probabilities derived from the backbone PLM(s) on the basis of a (few-shot) training set to guide the token selection process during prompt mutations. Experimental results on diverse benchmark datasets show that the proposed precondition-free method significantly outperforms the existing DFO-style counterparts that require preconditions, including black-box tuning, genetic prompt search and gradient-free instructional prompt search.
Machine Learning -> ML: Few-shot learning
Natural Language Processing -> NLP: Other
List of keywords 
Natural Language Processing -> NLP: Language models Machine Learning -> ML: Few-shot learning
Natural Language Processing -> NLP: Other
2836
The Hardness of Reasoning about Probabilities and Causality
[+] More 
[-] Less 
We study formal languages which are capable of fully expressing quantitative probabilistic reasoning and do-calculus reasoning for causal effects, from a computational complexity perspective. 
We focus on satisfiability problems whose instance formulas allow expressing many tasks in probabilistic and causal inference.  
The main contribution of this work is establishing the exact computational complexity of these satisfiability problems. 
We introduce a new natural complexity class, named succ∃R, which can be viewed as a succinct variant of the well-studied class ∃R, and show that these problems are complete for succ∃R. 
Our results imply even stronger limitations on the use of algorithmic methods for reasoning about probabilities and causality than  previous state-of-the-art results that rely only on the NP- or ∃R-completeness of the satisfiability problems for some restricted languages.
Knowledge Representation and Reasoning -> KRR: Causality
Machine Learning -> ML: Causality
List of keywords 
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference Knowledge Representation and Reasoning -> KRR: Causality
Machine Learning -> ML: Causality
2840
Part Aware Contrastive Learning for Self-Supervised Action Recognition
[+] More 
[-] Less 
In recent years, remarkable results have been achieved in self-supervised action recognition using skeleton sequences with contrastive learning. It has been observed that the semantic distinction of human action features is often represented by local body parts, such as legs or hands, which are advantageous for skeleton-based action recognition. This paper proposes an attention-based contrastive learning framework for skeleton representation learning, called SkeAttnCLR, which integrates local similarity and global features for skeleton-based action representations. To achieve this, a multi-head attention mask module is employed to learn the soft attention mask features from the skeletons, suppressing non-salient local features while accentuating local salient features, thereby bringing similar local features closer in the feature space. Additionally, ample contrastive pairs are generated by expanding contrastive pairs based on salient and non-salient features with global features, which guide the network to learn the semantic representations of the entire skeleton. Therefore, with the attention mask mechanism, SkeAttnCLR learns local features under different data augmentation views. The experiment results demonstrate that the inclusion of local feature similarity significantly enhances skeleton-based action representation. Our proposed SkeAttnCLR outperforms state-of-the-art methods on NTURGB+D, NTU120-RGB+D, and PKU-MMD datasets. The code and settings are available at this repository: https://github.com/GitHubOfHyl97/SkeAttnCLR.
Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Representation learning
Machine Learning -> ML: Self-supervised Learning
List of keywords 
Computer Vision -> CV: Action and behavior recognition Computer Vision -> CV: 3D computer vision
Computer Vision -> CV: Representation learning
Machine Learning -> ML: Self-supervised Learning
2841
ODEE: A One-Stage Object Detection Framework for Overlapping and Nested Event Extraction
[+] More 
[-] Less 
The task of extracting overlapping and nested events has received significant attention in recent times, as prior research has primarily focused on extracting flat events, overlooking the intricacies of overlapping and nested occurrences. In this work, we present a new approach to Event Extraction (EE) by reformulating it as an object detection task on a table of token pairs. Our proposed one-stage event extractor, called ODEE, can handle overlapping and nested events. The model is designed with a vertex-based tagging scheme and two auxiliary tasks of predicting the spans and types of event trigger words and argument entities, leveraging the full span information of event elements. Furthermore, in the training stage, we introduce a negative sampling method for table cells to address the imbalance problem of positive and negative table cell tags, meanwhile improving computational efficiency. Empirical evaluations demonstrate that ODEE achieves the state-of-the-art performance on three benchmarks for overlapping and nested EE (i.e., FewFC, Genia11, and Genia13). Furthermore, ODEE outperforms current state-of-the-art methods in terms of both number of parameters and inference speed, indicating its high computational efficiency. To facilitate future research in this area, the codes are publicly available at https://github.com/NingJinzhong/ODEE.
Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Named entities
List of keywords 
Natural Language Processing -> NLP: Information extraction Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Named entities
2847
Approximate Inference in Logical Credal Networks
[+] More 
[-] Less 
The Logical Credal Network or LCN is a recent probabilistic logic designed for effective aggregation and reasoning over multiple sources of imprecise knowledge. An LCN specifies a set of probability distributions over all interpretations of a set of logical formulas for which marginal and conditional probability bounds on their truth values are known. Inference in LCNs involves the exact solution of a non-convex non-linear program defined over an exponentially large number of non-negative real valued variables and, therefore, is limited to relatively small problems. In this paper, we present ARIEL — a novel iterative message-passing scheme for approximate inference in LCNs. Inspired by classical belief propagation for graphical models, our method propagates messages that involve solving considerably smaller local non-linear programs. Experiments on several classes of LCNs demonstrate clearly that ARIEL yields high quality solutions compared with exact inference and scales to much larger problems than previously considered.
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Uncertainty in AI -> UAI: Inference
List of keywords 
Uncertainty in AI -> UAI: Graphical models Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Uncertainty in AI -> UAI: Inference
2849
Spatial-Temporal Self-Attention for Asynchronous Spiking Neural Networks
[+] More 
[-] Less 
The brain-inspired spiking neural networks (SNNs) are receiving increasing attention due to their asynchronous event-driven characteristics and low power consumption. As attention mechanisms recently become an indispensable part of sequence dependence modeling, the combination of SNNs and attention mechanisms holds great potential for energy-efficient and high-performance computing paradigms. However, the existing works cannot benefit from both temporal-wise attention and the asynchronous characteristic of SNNs. To fully leverage the advantages of both SNNs and attention mechanisms, we propose an SNNs-based spatial-temporal self-attention (STSA) mechanism, which calculates the feature dependence across the time and space domains without destroying the asynchronous transmission properties of SNNs. To further improve the performance, we also propose a spatial-temporal relative position bias (STRPB) for STSA to consider the spatiotemporal position of spikes. Based on the STSA and STRPB, we construct a spatial-temporal spiking Transformer framework, named STS-Transformer, which is powerful and enables SNNs to work in an asynchronous event-driven manner. Extensive experiments are conducted on popular neuromorphic datasets and speech datasets, including DVS128 Gesture, CIFAR10-DVS, and Google Speech Commands, and our experimental results can outperform other state-of-the-art models.
Humans and AI -> HAI: Applications
Humans and AI -> HAI: Cognitive systems
List of keywords 
Humans and AI -> HAI: Cognitive modeling Humans and AI -> HAI: Applications
Humans and AI -> HAI: Cognitive systems
2869
Teacher Assistant-Based Knowledge Distillation Extracting Multi-level Features on Single Channel Sleep EEG
[+] More 
[-] Less 
Sleep stage classification is of great significance to the diagnosis of sleep disorders. However, existing sleep stage classification models based on deep learning are usually relatively large in size (wider and deeper), which makes them hard to be deployed on wearable devices. Therefore, it is a challenge to lighten the existing sleep stage classification models. In this paper, we propose a novel general knowledge distillation framework for sleep stage classification tasks called SleepKD. Our SleepKD, composed of the multi-level module, teacher assistant module, and other knowledge distillation modules, aims to lighten large-scale sleep stage classification models. Specifically, the multi-level module is able to transfer the multi-level knowledge extracted from sleep signals by the teacher model (large-scale model) to the student model (lightweight model). Moreover, the teacher assistant module bridges the large gap between the teacher and student network, and further improves the distillation. We evaluate our method on two public sleep datasets (Sleep-EDF and ISRUC-III). Compared to the baseline methods, the results show that our knowledge distillation framework achieves state-of-the-art performance. SleepKD can significantly lighten the sleep model while maintaining its classification performance. The source code is available at https://github.com/HychaoWang/SleepKD.
Humans and AI -> HAI: Brain sciences
Machine Learning -> ML: Classification
Multidisciplinary Topics and Applications -> MDA: Health and medicine
List of keywords 
Machine Learning -> ML: Applications Humans and AI -> HAI: Brain sciences
Machine Learning -> ML: Classification
Multidisciplinary Topics and Applications -> MDA: Health and medicine
2873
CROP: Towards Distributional-Shift Robust Reinforcement Learning Using Compact Reshaped Observation Processing
[+] More 
[-] Less 
The safe application of reinforcement learning (RL) requires generalization from limited training data to unseen scenarios. Yet, fulfilling tasks under changing circumstances is a key challenge in RL. Current state-of-the-art approaches for generalization apply data augmentation techniques to increase the diversity of training data. Even though this prevents overfitting to the training environment(s), it hinders policy optimization. Crafting a suitable observation, only containing crucial information, has been shown to be a challenging task itself. To improve data efficiency and generalization capabilities, we propose Compact Reshaped Observation Processing (CROP) to reduce the state information used for policy optimization. By providing only relevant information, overfitting to a specific training layout is precluded and generalization to unseen environments is improved. We formulate three CROPs that can be applied to fully observable observation- and action-spaces and provide methodical foundation. We empirically show the improvements of CROP in a distributionally shifted safety gridworld. We furthermore provide benchmark comparisons to full observability and data-augmentation in two different-sized procedurally generated mazes.
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Machine Learning -> ML: Robustness
List of keywords 
Machine Learning -> ML: Deep reinforcement learning AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Machine Learning -> ML: Robustness
2877
SemiGNN-PPI: Self-Ensembling Multi-Graph Neural Network for Efficient and Generalizable Protein–Protein Interaction Prediction
[+] More 
[-] Less 
Protein-protein interactions (PPIs) are crucial in various biological processes and their study has significant implications for drug development and disease diagnosis. Existing deep learning methods suffer from significant performance degradation under complex real-world scenarios due to various factors, e.g., label scarcity and domain shift. In this paper, we propose a self-ensembling multi-graph neural network (SemiGNN-PPI) that can effectively predict PPIs while being both efficient and generalizable. In SemiGNN-PPI, we not only model the protein correlations but explore the label dependencies by constructing and processing multiple graphs from the perspectives of both features and labels in the graph learning process. We further marry GNN with Mean Teacher to effectively leverage unlabeled graph-structured PPI data for self-ensemble graph learning. We also design multiple graph consistency constraints to align the student and teacher graphs in the feature embedding space, enabling the student model to better learn from the teacher model by incorporating more relationships. Extensive experiments on PPI datasets of different scales with different evaluation settings demonstrate that SemiGNN-PPI outperforms state-of-the-art PPI prediction methods, particularly in challenging scenarios such as training with limited annotations and testing on unseen data.
Multidisciplinary Topics and Applications -> MDA: Health and medicine
Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Semi-supervised learning
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Bioinformatics Multidisciplinary Topics and Applications -> MDA: Health and medicine
Machine Learning -> ML: Sequence and graph learning
Machine Learning -> ML: Semi-supervised learning
2883
Locate, Refine and Restore: A Progressive Enhancement Network for Camouflaged Object Detection
[+] More 
[-] Less 
Camouflaged Object Detection (COD) aims to segment objects that blend in with their surroundings. Most existing methods mainly tackle this issue by a single-stage framework, which tends to degrade performance in the face of small objects, low-contrast objects and objects with diverse appearances. In this paper, we propose a novel Progressive Enhancement Network (PENet) for COD by imitating the human visual detection system, which follows a three-stage detection process: locate objects, refine textures and restore boundary. Specifically, our PENet contains three key modules, i.e., the object location module (OLM), the group attention module (GAM) and the context feature restoration module (CFRM). The OLM is designed to position the object globally, the GAM is developed to refine both high-level semantic and low-level texture feature representation, and the CFRM is leveraged to effectively aggregate multi-level features for progressively restoring the clear boundary. Extensive results demonstrate that our PENet significantly outperforms 32 state-of-the-art methods on four widely used benchmark datasets
Computer Vision -> CV: Recognition (object detection, categorization)
List of keywords 
Computer Vision -> CV: Segmentation Computer Vision -> CV: Recognition (object detection, categorization)
2905
Computing Abductive Explanations for Boosted Regression Trees
[+] More 
[-] Less 
We present two algorithms for generating (resp. evaluating) abductive explanations for boosted regression trees. Given an instance x and an interval I containing its value F (x) for the boosted regression tree F at hand, the generation algorithm returns a (most general) term t over the Boolean conditions in F such that every instance x′ satisfying t is such that F (x′ ) ∈ I. The evaluation algorithm tackles the corresponding inverse problem: given F , x and a term t over the Boolean conditions in F such that t covers x, find the least interval I_t such that for every instance x′ covered by t we have F (x′ ) ∈ I_t . Experiments on various datasets show that the two algorithms are practical enough to be used for generating (resp. evaluating) abductive explanations for boosted regression trees based on a large number of Boolean conditions.
Constraint Satisfaction and Optimization -> CSO: Constraint programming
Machine Learning -> ML: Regression
List of keywords 
Machine Learning -> ML: Explainable/Interpretable machine learning Constraint Satisfaction and Optimization -> CSO: Constraint programming
Machine Learning -> ML: Regression
2907
Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits
[+] More 
[-] Less 
We study the Improving Multi-Armed Bandit problem, where the reward obtained from an arm increases with the number of pulls it receives. This model provides an elegant abstraction for many real-world problems in domains such as education and employment, where decisions about the distribution of opportunities can affect the future capabilities of communities and the disparity between them. A decision-maker in such settings must consider the impact of her decisions on future rewards in addition to the standard objective of maximizing her cumulative reward at any time. We study the tension between two seemingly conflicting objectives in the horizon-unaware setting: a) maximizing the cumulative reward at any time and b) ensuring that arms with better long-term rewards get sufficient pulls even if they initially have low rewards. We show that, surprisingly, the two objectives are aligned with each other. Our main contribution is an anytime algorithm for the IMAB problem that achieves the best possible cumulative reward while ensuring that the arms reach their true potential given sufficient time. Our algorithm mitigates the initial disparity due to lack of opportunity and continues pulling an arm until it stops improving. We prove the optimality of our algorithm by showing that a) any algorithm for the IMAB problem, no matter how utilitarian, must suffer Omega(T) policy regret and Omega(k) competitive ratio with respect to the optimal offline policy, and b) the competitive ratio of our algorithm is O(k).
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Uncertainty in AI -> UAI: Sequential decision making
List of keywords 
Machine Learning -> ML: Online learning AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Uncertainty in AI -> UAI: Sequential decision making
2911
Meta-Tsallis-Entropy Minimization: A New Self-Training Approach for Domain Adaptation on Text Classification
[+] More 
[-] Less 
Text classification is a fundamental task for natural language processing, and adapting text classification models across domains has broad applications. 
Self-training generates pseudo-examples from the model’s predictions and iteratively trains on the pseudo-examples, i.e., minimizes the loss on the source domain and the Gibbs entropy on the target domain. However, Gibbs entropy is sensitive to prediction errors, and thus, self-training tends to fail when the domain shift is large. In this paper, we propose Meta-Tsallis Entropy minimization (MTEM). MTEM uses an instance adaptive Tsallis entropy to replace the Gibbs entropy and a meta-learning algorithm to optimize the instance adaptive Tsallis entropy on the target domain. To reduce the computation cost of MTEM, we propose an approximation technique to approximate the second-order derivation involved in the meta-learning. To efficiently generate pseudo labels, we propose an annealing sampling mechanism for exploring the model’s prediction probability. Theoretically, we prove the convergence of the meta-learning algorithm in MTEM and analyze the effectiveness of MTEM in achieving domain adaptation. Experimentally, MTEM improves the adaptation performance of BERT with an average of 4 percent on the benchmark dataset.
Natural Language Processing -> NLP: Applications
List of keywords 
Natural Language Processing -> NLP: Text classification Natural Language Processing -> NLP: Applications
2912
The Parameterized Complexity of Finding Concise Local Explanations
[+] More 
[-] Less 
We consider the computational problem of finding a smallest local explanation (anchor) for classifying a given feature vector (example) by a black-box model.  After showing that the problem is NP-hard in general, we study various natural restrictions of the problem in terms of problem parameters to see whether these restrictions make the problem fixed-parameter tractable or not. We draw a detailed and systematic complexity landscape for combinations of parameters, including the size of the anchor, the size of the anchor’s coverage, and parameters that capture structural aspects of the problem instance, including rank-width, twin-width, and maximum difference.
Machine Learning -> ML: Explainable/Interpretable machine learning
List of keywords 
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning Machine Learning -> ML: Explainable/Interpretable machine learning
2927
ReLiNet: Stable and Explainable Multistep Prediction with Recurrent Linear Parameter Varying Networks
[+] More 
[-] Less 
Multistep prediction models are essential for the simulation and model-predictive control of dynamical systems. Verifying the safety of such models is a multi-faceted problem requiring both system-theoretic guarantees as well as establishing trust with human users. In this work, we propose a novel approach, ReLiNet (Recurrent Linear Parameter Varying Network), to ensure safety for multistep prediction of dynamical systems. Our approach simplifies a recurrent neural network to a switched linear system that is constrained to guarantee exponential stability, which acts as a surrogate for safety from a system-theoretic perspective. Furthermore, ReLiNet’s computation can be reduced to a single linear model for each time step, resulting in predictions that are explainable by definition, thereby establishing trust from a human-centric perspective. Our quantitative experiments show that ReLiNet achieves prediction accuracy comparable to that of state-of-the-art recurrent neural networks, while achieving more faithful and robust explanations compared to the model-agnostic explanation method of LIME.
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
List of keywords 
Machine Learning -> ML: Recurrent networks AI Ethics, Trust, Fairness -> ETF: Safety and robustness
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
2929
Complex Contagion Influence Maximization: A Reinforcement Learning Approach
[+] More 
[-] Less 
In influence maximization (IM), the goal is to find a set of seed nodes in a social network that maximizes the influence spread. While most IM problems focus on classical influence cascades (e.g., Independent Cascade and Linear Threshold) which assume individual influence cascade probability is independent of the number of neighbors, recent studies by sociologists show that many influence cascades follow a pattern called complex contagion (CC), where influence cascade probability is much higher when more neighbors are influenced. Nonetheless, there are very limited studies for complex contagion influence maximization (CCIM) problems. This is partly because CC is non-submodular, the solution of which has been an open challenge. 
In this study, we propose the first reinforcement learning (RL) approach to CCIM. We find that a key obstacle in applying existing RL approaches to CCIM is the reward sparseness issue, which comes from two distinct sources. We then design a new RL algorithm that uses the CCIM problem structure to address the issue. Empirical results show that our approach achieves the state-of-the-art performance on 9 real-world networks.
Machine Learning -> ML: Reinforcement learning
Multidisciplinary Topics and Applications -> MDA: Web and social networks
List of keywords 
Search -> S: Combinatorial search and optimisation Machine Learning -> ML: Reinforcement learning
Multidisciplinary Topics and Applications -> MDA: Web and social networks
2969
Shaken, and Stirred: Long-Range Dependencies Enable Robust Outlier Detection with PixelCNN++
[+] More 
[-] Less 
Reliable outlier detection is critical for real-world deployment of deep learning models. Although extensively studied, likelihoods produced by deep generative models have been largely dismissed as being impractical for outlier detection. First, deep generative model likelihoods are readily biased by low-level input statistics. Second, many recent solutions for correcting these biases are computationally expensive, or do not generalize well to complex, natural datasets. Here, we explore outlier detection with a state-of-the-art deep autoregressive model: PixelCNN++. We show that biases in PixelCNN++ likelihoods arise primarily from predictions based on local dependencies. We propose two families of bijective transformations — “stirring” and “shaking” — which ameliorate low-level biases and isolate the contribution of long-range dependencies to PixelCNN++ likelihoods. These transformations are inexpensive and readily computed at evaluation time. We test our approaches extensively with five grayscale and six natural image datasets and show that they achieve or exceed state-of-the-art outlier detection, particularly on datasets with complex, natural images. We also show that our solutions work well with other types of generative models (generative flows and variational autoencoders) and that their efficacy is governed by each model’s reliance on local dependencies. In sum, lightweight remedies suffice to achieve robust outlier detection on image data with deep generative models.
Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Robustness
List of keywords 
Computer Vision -> CV: Neural generative models, auto encoders, GANs   Data Mining -> DM: Anomaly/outlier detection
Machine Learning -> ML: Robustness
2989
ICDA: Illumination-Coupled Domain Adaptation Framework for Unsupervised Nighttime Semantic Segmentation
[+] More 
[-] Less 
The performance of nighttime semantic segmentation has been significantly improved thanks to recent unsupervised methods. However, these methods still suffer from complex domain gaps, i.e., the challenging illumination gap and the inherent dataset gap. In this paper, we propose the illumination-coupled domain adaptation framework(ICDA) to effectively avoid the illumination gap and mitigate the dataset gap by coupling daytime and nighttime images as a whole with semantic relevance. Specifically, we first design a new composite enhancement method(CEM) that considers not only illumination but also spatial consistency to construct the source and target domain pairs, which provides the basic adaptation unit for our ICDA. Next, to avoid the illumination gap, we devise the Deformable Attention Relevance(DAR) module to capture the semantic relevance inside each domain pair, which can couple the daytime and nighttime images at the feature level and adaptively guide the predictions of nighttime images. Besides, to mitigate the dataset gap and acquire domain-invariant semantic relevance, we propose the Prototype-based Class Alignment(PCA) module, which improves the usage of category information and performs fine-grained alignment. Extensive experiments show that our method reduces the complex domain gaps and achieves state-of-the-art performance for nighttime semantic segmentation.  Our code is available at https://github.com/chenghaoDong666/ICDA.
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
List of keywords 
Computer Vision -> CV: Segmentation Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
2998
Efficient NLP Model Finetuning via Multistage Data Filtering
[+] More 
[-] Less 
As model finetuning is central to the modern NLP, we set to maximize its efficiency. Motivated by redundancy in training examples and the sheer sizes of pretrained models, we exploit a key opportunity: training only on important data. To this end, we set to filter training examples in a streaming fashion, in tandem with training the target model. Our key techniques are two: (1) automatically determine a training loss threshold for skipping backward training passes; (2) run a meta predictor for further skipping forward training passes. We integrate the above techniques in a holistic, three-stage training pro- cess. On a diverse set of benchmarks, our method reduces the required training examples by up to 5.3× and training time by up to 6.8×, while only seeing minor accuracy degradation. Our method is effective even for training one epoch, where each training example is encountered only once. It is simple to implement and is compatible with the existing finetuning techniques. Code is available at: https://github.com/xo28/efficient-NLP-multistage-training
Natural Language Processing -> NLP: Language models
Natural Language Processing -> NLP: Text classification
List of keywords 
Machine Learning -> ML: Automated machine learning Natural Language Processing -> NLP: Language models
Natural Language Processing -> NLP: Text classification
3002
Scalable Verification of Strategy Logic through Three-Valued Abstraction
[+] More 
[-] Less 
The model checking problem for multi-agent systems against Strategy Logic specifications is known to be non-elementary. On this logic several fragments have been defined to tackle this issue but at the expense of expressiveness. In this paper, we propose a three-valued semantics for Strategy Logic upon which we define an abstraction method. We show that the latter semantics is an approximation of the classic two-valued one for Strategy Logic. Furthermore, we extend MCMAS, an open-source model checker for multi-agent specifications, to incorporate our abstraction method and present some promising experimental results.
Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
3034
Minimizing Reachability Times on Temporal Graphs via Shifting Labels
[+] More 
[-] Less 
We study how we can accelerate the spreading of information in temporal graphs via shifting operations; a problem that captures real-world applications varying from information flows to distribution schedules. In a temporal graph there is a set of fixed vertices and the available connections between them change over time in a predefined manner.  We observe that, in some cases, shifting some connections, i.e., advancing or delaying them, can decrease the time required to reach from some vertex (source) to another vertex. We study how we can minimize the maximum time a set of sources needs to reach every vertex, when we are allowed to shift some of the connections. If we restrict the allowed number of changes, we prove that, already for a single source, the problem is NP-hard, and W[2]-hard when parameterized by the number of changes. Then we focus on unconstrained number of changes. We derive a polynomial-time algorithm when there is one source. When there are two sources, we show that the problem becomes NP-hard; on the other hand, we design an FPT algorithm parameterized by the treewidth of the graph plus the lifetime of the optimal solution, that works for any number of sources. Finally, we provide polynomial-time algorithms for several graph classes.
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: Scheduling
List of keywords 
Planning and Scheduling -> PS: Theoretical foundations of planning Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: Scheduling
3037
RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models
[+] More 
[-] Less 
The emergence of Neural Radiance Fields (NeRF) has promoted the development of synthesized high-fidelity views of the intricate real world. However, it is still a very demanding task to repaint the content in NeRF. In this paper, we propose a novel framework that can take RGB images as input and alter the 3D content in neural scenes. Our work leverages existing diffusion models to guide changes in the designated 3D content. Specifically, we semantically select the target object and a pre-trained diffusion model will guide the NeRF model to generate new 3D objects, which can improve the editability, diversity, and application range of NeRF. Experiment results show that our algorithm is effective for editing 3D objects in NeRF under different text prompts, including editing appearance, shape, and more. We validate our method on both real-world datasets and synthetic-world datasets for these editing tasks. Please visit https://repaintnerf.github.io for a better view of our results.
Computer Vision -> CV: Applications
Computer Vision -> CV: Neural generative models, auto encoders, GANs
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Applications
Computer Vision -> CV: Neural generative models, auto encoders, GANs
3040
Dual Video Summarization: From Frames to Captions
[+] More 
[-] Less 
Video summarization and video captioning both condense the video content from the perspective of visual and text modes, i.e. the keyframe selection and language description generation. Existing video-and-language learning models commonly sample multiple frames for training instead of observing all. These sampled deputies greatly improve computational efficiency, but do they represent the original video content enough with no more redundancy? In this work, we propose a dual video summarization framework and verify it in the context of video captioning. Given the video frames, we firstly extract the visual representation based on the ViT model fine-tuned on the video-text domain. Then we summarize the keyframes according to the frame-lever score.  To compress the number of keyframes as much as possible while ensuring the quality of captioning, we learn a cross-modal video summarizer to select the most semantically consistent frames according to the pseudo score label. Top K frames ( K is no more than 3% of the entire video.) are chosen to form the video representation. Moreover, to evaluate the static appearance and temporal information of video, we design the ranking scheme of video representation from two aspects: feature-oriented and sequence-oriented. Finally, we generate the descriptions with a lightweight LSTM decoder. The experiment results on the MSR-VTT and MSVD dataset reveal that, for the generative task as video captioning, a small number of keyframes can convey the same semantic information to perform well on captioning, or even better than the original sampling.
Computer Vision -> CV: Video analysis and understanding
List of keywords 
Computer Vision -> CV: Vision and language  Computer Vision -> CV: Video analysis and understanding
3049
Neuro-Symbolic Class Expression Learning
[+] More 
[-] Less 
Models computed using deep learning have been effectively applied to tackle various problems in many disciplines. Yet, the predictions of these models are often at most post-hoc and locally explainable. 
In contrast, class expressions in description logics are ante-hoc and globally explainable. Although state-of-the-art symbolic machine learning approaches are being successfully applied to learn class expressions, their application at large scale has been hindered by their impractical runtimes. Arguably, the reliance on myopic heuristic functions contributes to this limitation. We propose a novel neuro-symbolic class expression learning model, DRILL, to mitigate this limitation. By learning non-myopic heuristic functions with deep Q-learning, DRILL efficiently steers the standard search procedure in a quasi-ordered search space towards goal states. Our extensive experiments on 4 benchmark datasets and 390 learning problems suggest that DRILL converges to goal states at least 2.7 times faster than state-of-the-art models on all learning problems. The results of our statistical significance test confirms that DRILL converges to goal states significantly faster (p-value <1%) than state-of-the-art models on all benchmark datasets. We provide an open-source implementation of DRILL, including pre-trained models, training and evaluation scripts.
Machine Learning -> ML: Deep reinforcement learning
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
List of keywords 
Machine Learning -> ML: Neuro-symbolic methods Machine Learning -> ML: Deep reinforcement learning
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies
3072
Learning Preference Models with Sparse Interactions of Criteria
[+] More 
[-] Less 
Multicriteria decision making requires defining the result of conflicting and possibly interacting criteria. Allowing criteria interactions in a decision model increases the complexity of the preference learning task due to the combinatorial nature of the possible interactions. In this paper, we propose an approach to learn a decision model in which the interaction pattern is revealed from preference data and kept as simple as possible.  We consider weighted aggregation functions like multilinear utilities or Choquet integrals, admitting representations including non-linear terms measuring the joint benefit or penalty attached to some combinations of criteria. The weighting coefficients known as Möbius masses model positive or negative synergies among criteria. We propose an approach to learn the Möbius masses, based on iterative reweighted least square for sparse recovery, and dualization to improve scalability. This approach is applied to learn sparse representations of the multilinear utility model and conjunctive/disjunctive forms of the discrete Choquet integral from preferences examples, in aggregation problems possibly involving more than 20 criteria.
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Uncertainty in AI -> UAI: Decision and utility theory
List of keywords 
Machine Learning -> ML: Learning preferences or rankings Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Uncertainty in AI -> UAI: Decision and utility theory
3073
Simplification and Improvement of MMS Approximation
[+] More 
[-] Less 
We consider the problem of fairly allocating a set of indivisible goods among n agents with additive valuations, using the popular fairness notion of maximin share (MMS). Since MMS allocations do not always exist, a series of works provided existence and algorithms for approximate MMS allocations. The Garg-Taki algorithm gives the current best approximation factor of (3/4 + 1/12n). Most of these results are based on complicated analyses, especially those providing better than 2/3 factor. Moreover, since no tight example is known of the Garg-Taki algorithm, it is unclear if this is the best factor of this approach. In this paper, we significantly simplify the analysis of this algorithm and also improve the existence guarantee to a factor of (3/4 + min(1/36, 3/(16n-4))). For small n, this provides a noticeable improvement. Furthermore, we present a tight example of this algorithm, showing that this may be the best factor one can hope for with the current techniques.
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Computational social choice
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Computational social choice
3078
Negative Flux Aggregation to Estimate Feature Attributions
[+] More 
[-] Less 
There are increasing demands for understanding deep neural networks’ (DNNs) behavior spurred by growing security and/or transparency concerns. Due to multi-layer nonlinearity of the deep neural network architectures, explaining DNN predictions still remains as an open problem, preventing us from gaining a deeper understanding of the mechanisms. To enhance the explainability of DNNs, we estimate the input feature’s attributions to the prediction task using divergence and flux. Inspired by the divergence theorem in vector analysis, we develop a novel Negative Flux Aggregation (NeFLAG) formulation and an efficient approximation algorithm to estimate attribution map. Unlike the previous techniques, ours doesn’t rely on fitting a surrogate model nor need any path integration of gradients. Both qualitative and quantitative experiments demonstrate a superior performance of NeFLAG in generating more faithful attribution maps than the competing methods.  Our code is available at https://github.com/xinli0928/NeFLAG.
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
3093
Spotlight News Driven Quantitative Trading Based on Trajectory Optimization
[+] More 
[-] Less 
News-driven quantitative trading (NQT) has been popularly studied in recent years. Most existing NQT methods are performed in a two-step paradigm, i.e., first analyzing markets by a financial prediction task and then making trading decisions, which is doomed to failure due to the nearly futile financial prediction task. To bypass the financial prediction task, in this paper, we focus on reinforcement learning (RL) based NQT paradigm, which leverages news to make profitable trading decisions directly. In this paper, we propose a novel NQT framework SpotlightTrader based on decision trajectory optimization, which can effectively stitch together a continuous and flexible sequence of trading decisions to maximize profits. In addition, we enhance this framework by constructing a spotlight-driven state trajectory that obeys a stochastic process with irregular abrupt jumps caused by spotlight news. Furthermore, in order to adapt to non-stationary financial markets, we propose an effective training pipeline for this framework, which blends offline pretraining with online finetuning to balance exploration and exploitation effectively during online tradings. Extensive experiments on three real-world datasets demonstrate our proposed model’s superiority over the state-of-the-art NQT methods.
Machine Learning -> ML: Deep reinforcement learning
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Finance Machine Learning -> ML: Deep reinforcement learning
3095
Learning Small Decision Trees with Large Domain
[+] More 
[-] Less 
One favors decision trees (DTs) of the smallest size or depth to facilitate explainability and interpretability. However, learning such an optimal DT from data is well-known to be NP-hard. To overcome this complexity barrier, Ordyniak and Szeider (AAAI 21) initiated the study of optimal DT learning under the parameterized complexity perspective. They showed that solution size (i.e., number of nodes or depth of the DT) is insufficient to obtain fixed-parameter tractability (FPT). Therefore, they proposed an FPT algorithm that utilizes two auxiliary parameters: the maximum difference (as a structural property of the data set) and maximum domain size. They left it as an open question of whether bounding the maximum domain size is necessary.
The main result of this paper answers this question. We present FPT algorithms for learning a smallest or  lowest-depth DT from data, with the only parameters solution size and maximum difference. Thus, our algorithm is significantly more potent than the one by Szeider and Ordyniak as it can handle problem inputs with features that range over unbounded domains. We also close several gaps concerning the quality of approximation one obtains by only considering DTs based on minimum support sets.
Machine Learning -> ML: Explainable/Interpretable machine learning
List of keywords 
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning Machine Learning -> ML: Explainable/Interpretable machine learning
3109
Safe Multi-agent Learning via Trapping Regions
[+] More 
[-] Less 
One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to most single-agent environments, and sets a prohibitive barrier for deployment in practical applications, as it induces uncertainty in long term behavior of the system. In this work, we apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a heuristic sampling algorithm for scenarios where learning dynamics are not known. We demonstrate the applications to a regularized version of Dirac Generative Adversarial Network,  a four-intersection traffic control scenario run in a state of the art open-source microscopic traffic simulator SUMO, and a mathematical model of economic competition.
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning 3115
Fairly Allocating Goods and (Terrible) Chores
[+] More 
[-] Less 
We study the fair allocation of mixture of indivisible goods and chores under lexicographic preferences—a subdomain of additive preferences. A prominent fairness notion for allocating indivisible items is envy-freeness up to any item (EFX). Yet, its existence and computation has remained a notable open problem. By identifying a class of instances with "terrible chores", we  show that determining the existence of an EFX allocation is NP-complete. This result immediately implies the intractability of EFX under additive preferences. Nonetheless, we propose a natural subclass of lexicographic preferences for which an EFX and Pareto optimal (PO) allocation is guaranteed to exist and can be computed efficiently for any mixed instance. Focusing on two weaker fairness notions, we investigate finding EF1 and Pareto optimal allocations for special instances with terrible chores, and show that MMS and PO allocations can be computed efficiently for any mixed instance with lexicographic preferences.
Game Theory and Economic Paradigms -> GTEP: Computational social choice
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division Game Theory and Economic Paradigms -> GTEP: Computational social choice
3117
Finding an ϵ-Close Minimal Variation of Parameters in Bayesian Networks
[+] More 
[-] Less 
This paper addresses the ε-close parameter tuning problem for Bayesian
networks (BNs): find a minimal ε-close amendment of probability entries
in a given set of (rows in) conditional probability tables that make a
given quantitative constraint on the BN valid. Based on the
state-of-the-art “region verification” techniques for parametric Markov
chains, we propose an algorithm whose capabilities go
beyond any existing techniques. Our experiments show that ε-close tuning
of large BN benchmarks with up to eight parameters is feasible. In
particular, by allowing (i) varied parameters in multiple CPTs and (ii)
inter-CPT parameter dependencies, we treat subclasses of parametric BNs
that have received scant attention so far.
Uncertainty in AI -> UAI: Graphical models
Uncertainty in AI -> UAI: Tractable probabilistic models
List of keywords 
Uncertainty in AI -> UAI: Bayesian networks Uncertainty in AI -> UAI: Graphical models
Uncertainty in AI -> UAI: Tractable probabilistic models
3127
Building Concise Logical Patterns by Constraining Tsetlin Machine Clause Size
[+] More 
[-] Less 
Tsetlin Machine (TM) is a logic-based machine learning approach with the crucial advantages of being transparent and hardware-friendly. While TMs match or surpass deep learning accuracy for an increasing number of applications, large clause pools tend to produce clauses with many literals (long clauses). As such, they become less interpretable. Further, longer clauses increase the switching activity of the clause logic in hardware, consuming more power. This paper introduces a novel variant of TM learning — Clause Size Constrained TMs (CSC-TMs) —  where one can set a soft constraint on the clause size. As soon as a clause includes more literals than the constraint allows, it starts expelling literals. Accordingly, oversized clauses only appear transiently. To evaluate CSC-TM, we conduct classification, clustering, and regression experiments on tabular data, natural language text, images, and board games. Our results show that CSC-TM maintains accuracy with up to 80 times fewer literals. Indeed, the accuracy increases with shorter clauses for TREC and BBC Sports. After the accuracy peaks, it drops gracefully as the clause size approaches one literal. We finally analyze CSC-TM power consumption and derive new convergence properties.
Machine Learning -> ML: Other
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
Machine Learning -> ML: Applications
List of keywords 
Machine Learning -> ML: Explainable/Interpretable machine learning Machine Learning -> ML: Other
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
Machine Learning -> ML: Applications
3138
Sketch Recognition via Part-based Hierarchical Analogical Learning
[+] More 
[-] Less 
Sketch recognition has been studied for decades, but it is far from solved. Drawing styles are highly variable across people and adapting to idiosyncratic visual expressions requires data-efficient learning. Explainability also matters, so that users can see why a system got confused about something. This paper introduces a novel part-based approach for sketch recognition, based on hierarchical analogical learning, a new method to apply analogical learning to qualitative representations. Given a sketched object, our system automatically segments it into parts and constructs multi-level qualitative representations of them. Our approach performs analogical generalization at multiple levels of part descriptions and uses coarse-grained results to guide interpretation at finer levels. Experiments on the Berlin TU dataset and the Coloring Book Objects dataset show that the system can learn explainable models in a data-efficient manner.
Knowledge Representation and Reasoning -> KRR: Case-based reasoning
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
List of keywords 
Humans and AI -> HAI: Cognitive modeling Knowledge Representation and Reasoning -> KRR: Case-based reasoning
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
3139
The Computational Complexity of Single-Player Imperfect-Recall Games
[+] More 
[-] Less 
We study single-player extensive-form games with imperfect recall, such as the Sleeping Beauty problem or the Absentminded Driver game. For such games, two natural equilibrium concepts have been proposed as alternative solution concepts to ex-ante optimality. One equilibrium concept uses generalized double halving (GDH) as a belief system and evidential decision theory (EDT), and another one uses generalized thirding (GT) as a belief system and causal decision theory (CDT). Our findings relate those three solution concepts of a game to solution concepts of a polynomial maximization problem: global optima, optimal points with respect to subsets of variables and Karush–Kuhn–Tucker (KKT) points. Based on these correspondences, we are able to settle various complexity-theoretic questions on the computation of such strategies. For ex-ante optimality and (EDT,GDH)-equilibria, we obtain NP-hardness and inapproximability, and for (CDT,GT)-equilibria we obtain CLS-completeness results.
Uncertainty in AI -> UAI: Decision and utility theory
Game Theory and Economic Paradigms -> GTEP: Other
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Noncooperative games Uncertainty in AI -> UAI: Decision and utility theory
Game Theory and Economic Paradigms -> GTEP: Other
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
3140
Can You Improve My Code? Optimizing Programs with Local Search
[+] More 
[-] Less 
This paper introduces a local search method for improving an existing program with respect to a measurable objective. Program Optimization with Locally Improving Search (POLIS) exploits the structure of a program, defined by its lines. POLIS improves a single line of the program while keeping the remaining lines fixed, using existing brute-force synthesis algorithms, and continues iterating until it is unable to improve the program’s performance. POLIS was evaluated with a 27-person user study, where participants wrote programs attempting to maximize the score of two single-agent games: Lunar Lander and Highway. POLIS was able to substantially improve the participants’ programs with respect to the game scores. A proof-of-concept demonstration on existing Stack Overflow code measures applicability in real-world problems. These results suggest that POLIS could  be used as a helpful programming assistant for programming problems with measurable objectives.
Humans and AI -> HAI: Human-AI collaboration
List of keywords 
Humans and AI -> HAI: Applications Humans and AI -> HAI: Human-AI collaboration
3153
Disentanglement of Latent Representations via Causal Interventions
[+] More 
[-] Less 
The process of generating data such as images is controlled by independent and unknown factors of variation. The retrieval of these variables has been studied extensively in the disentanglement, causal representation learning, and independent component analysis fields. Recently, approaches merging these domains together have shown great success. Instead of directly representing the factors of variation, the problem of disentanglement can be seen as finding the interventions on one image that yield a change to a single factor. Following this assumption, we introduce a new method for disentanglement inspired by causal dynamics that combines causality theory with vector-quantized variational autoencoders. Our model considers the quantized vectors as causal variables and links them in a causal graph. It performs causal interventions on the graph and generates atomic transitions affecting a unique factor of variation in the image. We also introduce a new task of action retrieval that consists of finding the action responsible for the transition between two images. We test our method on standard synthetic and real-world disentanglement datasets. We show that it can effectively disentangle the factors of variation and perform precise interventions on high-level semantic attributes of an image without affecting its quality, even with imbalanced data distributions.
Computer Vision -> CV: Representation learning
Machine Learning -> ML: Autoencoders
List of keywords 
Knowledge Representation and Reasoning -> KRR: Causality Computer Vision -> CV: Representation learning
Machine Learning -> ML: Autoencoders
3155
Exploring Structural Similarity in Fitness Landscapes via Graph Data Mining: A Case Study on Number Partitioning Problems
[+] More 
[-] Less 
One of the most common problem-solving heuristics is by analogy. For a given problem, a solver can be viewed as a strategic walk on its fitness landscape. Thus if a solver works for one problem instance, we expect it will also be effective for other instances whose fitness landscapes essentially share structural similarities with each other. However, due to the black-box nature of combinatorial optimization, it is far from trivial to infer such similarity in real-world scenarios. To bridge this gap, by using local optima network as a proxy of fitness landscapes, this paper proposed to leverage graph data mining techniques to conduct qualitative and quantitative analyses to explore the latent topological structural information embedded in those landscapes. In our experiments, we use the number partitioning problem as the case and our empirical results are inspiring to support the overall assumption of the existence of structural similarity between landscapes within neighboring dimensions. Besides, experiments on simulated annealing demonstrate that the performance of a meta-heuristic solver is similar on structurally similar landscapes.
Data Mining -> DM: Data visualization
Data Mining -> DM: Exploratory data mining
Data Mining -> DM: Mining graphs
List of keywords 
Search -> S: Combinatorial search and optimisation Data Mining -> DM: Data visualization
Data Mining -> DM: Exploratory data mining
Data Mining -> DM: Mining graphs
3162
DiffAR: Adaptive Conditional Diffusion Model for Temporal-augmented Human Activity Recognition
[+] More 
[-] Less 
Human activity recognition (HAR) is a fundamental sensing and analysis technique that supports diverse applications, such as smart homes and healthcare. In device-free and non-intrusive HAR, WiFi channel state information (CSI) captures wireless signal variations caused by human interference without the need for video cameras or on-body sensors. However, current CSI-based HAR performance is hampered by incomplete CSI recordings due to fixed window sizes in CSI collection and human/machine errors that incur missing values in CSI. To address these issues, we propose DiffAR, a temporal-augmented HAR approach that improves HAR performance by augmenting CSI. DiffAR devises a novel Adaptive Conditional Diffusion Model (ACDM) to synthesize augmented CSI, which tackles the issue of fixed windows by forecasting and handles missing values with imputation. Compared to existing diffusion models, ACDM improves the synthesis quality by guiding progressive synthesis with step-specific conditions. DiffAR further exploits an ensemble classifier for activity recognition using both raw and augmented CSI. Extensive experiments on four public datasets show that DiffAR achieves the best synthesis quality of augmented CSI and outperforms state-of-the-art CSI-based HAR methods in recognition performance. The source code of DiffAR is available at https://github.com/huangshk/DiffAR.
Machine Learning -> ML: Time series and data streams
Data Mining -> DM: Mining spatial and/or temporal data
Machine Learning -> ML: Applications
List of keywords 
Machine Learning -> ML: Semi-supervised learning Machine Learning -> ML: Time series and data streams
Data Mining -> DM: Mining spatial and/or temporal data
Machine Learning -> ML: Applications
3171
On Optimal Strategies for Wordle and General Guessing Games
[+] More 
[-] Less 
The recent popularity of Wordle has revived interest in guessing games. We develop a general method for finding optimal strategies for guessing games while avoiding an exhaustive search. Our main contribution are several theorems that build towards a general theory to prove optimality of a strategy for a guessing game. This work is developed to apply to any guessing game, but we use Wordle as an example to present concrete results.
Multidisciplinary Topics and Applications -> MDA: Game playing
Search -> S: Applications
List of keywords 
Search -> S: Combinatorial search and optimisation Multidisciplinary Topics and Applications -> MDA: Game playing
Search -> S: Applications
3174
Cognitively Inspired Learning of Incremental Drifting Concepts
[+] More 
[-] Less 
Humans continually expand their learned knowledge to new domains and learn new concepts without any interference with past learned experiences. In contrast, machine learning models perform poorly in a continual learning setting, where input data distribution changes over time. Inspired by the nervous system learning mechanisms, we develop a computational model that enables a deep neural network to learn new concepts and expand its learned knowledge to new domains incrementally in a continual learning setting. We rely on the Parallel Distributed Processing theory to encode abstract concepts in an embedding space in terms of a multimodal distribution. This embedding space is modeled by internal data representations in a hidden network layer. We also leverage the Complementary Learning Systems theory to equip the model with a memory mechanism to overcome catastrophic forgetting through implementing pseudo-rehearsal. Our model can generate pseudo-data points for experience replay and accumulate new experiences to past learned experiences without causing cross-task interference.
Humans and AI -> HAI: Brain sciences
Humans and AI -> HAI: Cognitive systems
List of keywords 
Humans and AI -> HAI: Cognitive modeling Humans and AI -> HAI: Brain sciences
Humans and AI -> HAI: Cognitive systems
3195
Character As Pixels: A Controllable Prompt Adversarial Attacking Framework for Black-Box Text Guided Image Generation Models
[+] More 
[-] Less 
In this paper, we study a controllable prompt adversarial attacking problem for text guided image generation (Text2Image) models in the black-box scenario, where the goal is to attack specific visual subjects (e.g., changing a brown dog to white) in a generated image by slightly, if not imperceptibly, perturbing the characters of the driven prompt (e.g., “brown” to “br0wn”). Our study is motivated by the limitations of current Text2Image attacking approaches that still rely on manual trials to create adversarial prompts. To address such limitations, we develop CharGrad, a character-level gradient based attacking framework that replaces specific characters of a prompt with pixel-level similar ones by interactively learning the perturbation direction for the prompt and updating the attacking examiner for the generated image based on a novel proxy perturbation representation for characters.  We evaluate CharGrad using the texts from two public image captioning datasets. Results demonstrate that CharGrad outperforms existing text adversarial attacking approaches on attacking various subjects of generated images by black-box Text2Image models in a more effective and efficient way with less perturbation on the characters of the prompts.
Computer Vision -> CV: Neural generative models, auto encoders, GANs
List of keywords 
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods Computer Vision -> CV: Neural generative models, auto encoders, GANs
3200
RZCR: Zero-shot Character Recognition via Radical-based Reasoning
[+] More 
[-] Less 
The long-tail effect is a common issue that limits the performance of deep learning models on real-world datasets. Character image datasets are also affected by such unbalanced data distribution due to differences in character usage frequency. Thus, current character recognition methods are limited when applied in the real world, especially for the categories in the tail that lack training samples, e.g., uncommon characters. In this paper, we propose a zero-shot character recognition framework via radical-based reasoning, called RZCR, to improve the recognition performance of few-sample character categories in the tail. Specifically, we exploit radicals, the graphical units of characters, by decomposing and reconstructing characters according to orthography. RZCR consists of a visual semantic fusion-based radical information extractor (RIE) and a knowledge graph character reasoner (KGR). RIE aims to recognize candidate radicals and their possible structural relations from character images in parallel. The results are then fed into KGR to recognize the target character by reasoning with a knowledge graph. We validate our method on multiple datasets, and RZCR shows promising experimental results, especially on few-sample character datasets.
Multidisciplinary Topics and Applications -> MDA: Humanities
List of keywords 
Computer Vision -> CV: Vision and language  Multidisciplinary Topics and Applications -> MDA: Humanities
3221
Beyond Homophily: Robust Graph Anomaly Detection via Neural Sparsification
[+] More 
[-] Less 
Recently, graph-based anomaly detection (GAD) has attracted rising attention due to its effectiveness in identifying anomalies in relational and structured data. Unfortunately, the performance of most existing GAD methods suffers from the inherent structural noises of graphs induced by hidden anomalies connected with considerable benign nodes. In this work, we propose SparseGAD, a novel GAD framework that sparsifies the structures of target graphs to effectively reduce noises and collaboratively learns node representations. It then robustly detects anomalies by uncovering the underlying dependency among node pairs in terms of homophily and heterophily, two essential connection properties of GAD. Extensive experiments on real-world datasets of GAD demonstrate that the proposed framework achieves significantly better detection quality compared with the state-of-the-art methods, even when the graph is heavily attacked. Code will be available at https://github.com/KellyGong/SparseGAD.git.
Data Mining -> DM: Anomaly/outlier detection
Data Mining -> DM: Mining graphs
List of keywords 
Data Mining -> DM: Applications Data Mining -> DM: Anomaly/outlier detection
Data Mining -> DM: Mining graphs
3222
Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models
[+] More 
[-] Less 
Pruning has been extensively studied in Transformer-based language models to improve efficiency. Typically, we zero (prune) unimportant model weights and train a derived compact model to improve final accuracy. For pruned weights, we treat them as useless and discard them. This usually leads to significant model accuracy degradation. In this paper, we focus on attention head pruning as head attention is a key component of the transformer-based language models and provides interpretable knowledge meaning. We reveal the relationship between pruned attention heads and retained heads and provide a solution to recycle the discarded knowledge from the pruned heads, named peer distillation. We also develop an automatic framework to locate the to-be-pruned attention heads in each layer, freeing the time-consuming human labor in tuning hyperparameters.Experimental results on the General Language Understanding Evaluation  (GLUE)  benchmark are provided using BERT model. By recycling discarded knowledge from pruned heads, the proposed method maintains model performance across all nine tasks while reducing heads by over 58% on average and outperforms state-of-the-art techniques (e.g., Random, HISP, L0 Norm, SMP).
List of keywords 
Natural Language Processing -> NLP: Language models 3243
MAT: Mixed-Strategy Game of Adversarial Training in Fine-tuning
[+] More 
[-] Less 
Fine-tuning large-scale pre-trained language models has been demonstrated effective for various natural language processing (NLP) tasks. Previous studies have established that incorporating adversarial training during the fine-tuning stage can significantly enhance model generalization and robustness. However, from the perspective of game theory, such utilizations of adversarial training correspond to pure-strategy games, which are inherently limited in terms of the scope of their strategies, thereby still having room for improvement. In order to push the performance boundaries, we propose a novel Mixed-strategy Adversarial Training algorithm (MAT). Methodologically, we derive the Nash equilibrium of a mixed-strategy game for adversarial training using Entropy Mirror Descent to establish MAT by sampling method. To verify the effectiveness of MAT, we conducted extensive benchmark experiments on large-scale pre-trained models, such as BERT and RoBERTa. MAT significantly outperforms the state-of-the-art methods on both the GLUE and ANLI benchmarks in terms of generalization and robustness.
Natural Language Processing -> NLP: Other
List of keywords 
Machine Learning -> ML: Adversarial machine learning Natural Language Processing -> NLP: Other
3254
SAD: Semi-Supervised Anomaly Detection on Dynamic Graphs
[+] More 
[-] Less 
Anomaly detection aims to distinguish abnormal instances that deviate significantly from the majority of benign ones. As instances that appear in the real world are naturally connected and can be represented with graphs, graph neural networks become increasingly popular in tackling the anomaly detection problem. Despite the promising results, research on anomaly detection has almost exclusively focused on static graphs while the mining of anomalous patterns from dynamic graphs is rarely studied but has significant application value. In addition, anomaly detection is typically tackled from semi-supervised perspectives due to the lack of sufficient labeled data. However, most proposed methods are limited to merely exploiting labeled data, leaving a large number of unlabeled samples unexplored. In this work, we present semi-supervised anomaly detection (SAD), an end-to-end framework for anomaly detection on dynamic graphs. By a combination of a time-equipped memory bank and a pseudo-label contrastive learning module, SAD is able to fully exploit the potential of large unlabeled samples and uncover underlying anomalies on evolving graph streams. Extensive experiments on four real-world datasets demonstrate that SAD efficiently discovers anomalies from dynamic graphs and outperforms existing advanced methods even when provided with only little labeled data.
Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Time series and data streams
List of keywords 
Data Mining -> DM: Anomaly/outlier detection Machine Learning -> ML: Semi-supervised learning
Machine Learning -> ML: Time series and data streams
3260
Beyond Pure Text: Summarizing Financial Reports Based on Both Textual and Tabular Data
[+] More 
[-] Less 
Abstractive text summarization is to generate concise summaries that well preserve both salient information and the overall semantic meanings of the given documents. However, real-world documents, e.g., financial reports, generally contain rich data such as charts and tabular data which invalidates most existing text summarization approaches. This paper is thus motivated to propose this novel approach to simultaneously summarize both textual and tabular data. Particularly, we first manually construct a “table+text → summary” dataset. Then, the tabular data is respectively embedded in a row-wise and column-wise manner, and the textual data is encoded at the sentence-level via an employed pre-trained model. We propose a salient detector gate respectively performed between each pair of row/column and sentence embeddings. The highly correlated content is considered as salient information that must be summarized. Extensive experiments have been performed on our constructed dataset and the promising results demonstrate the effectiveness of the proposed approach w.r.t. a number of both automatic and human evaluation criteria.
Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Language generation
List of keywords 
Natural Language Processing -> NLP: Summarization Natural Language Processing -> NLP: Applications
Natural Language Processing -> NLP: Language generation
3263
On Adversarial Robustness of Demographic Fairness in Face Attribute Recognition
[+] More 
[-] Less 
Demographic fairness has become a critical objective when developing modern visual models for identity-sensitive applications, such as face attribute recognition (FAR). While great efforts have been made to improve the fairness of the models, the investigation on the adversarial robustness of the fairness (e.g., whether the fairness of the models could still be maintained under potential malicious fairness attacks) is largely ignored. Therefore, this paper explores the adversarial robustness of demographic fairness in FAR applications from both attacking and defending perspectives. In particular, we firstly present a novel fairness attack, who aims at corrupting the demographic fairness of face attribute classifiers. Next, to mitigate the effect of the fairness attack, we design an efficient defense algorithm called robust-fair training. With this defense, face attribute classifiers learn how to combat the bias introduced by the fairness attack. As such, the face attribute classifiers are not only trained to be fair, but the fairness is also robust. Our extensive experimental results show the effectiveness of both our proposed attack and defense methods across various model architectures and FAR applications. We believe our work could be strong baselines for future work on robust-fair AI models.
Computer Vision -> CV: Bias, fairness and privacy
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Bias Computer Vision -> CV: Bias, fairness and privacy
3271
IMF: Integrating Matched Features Using Attentive Logit in Knowledge Distillation
[+] More 
[-] Less 
Knowledge distillation (KD) is an effective method for transferring the knowledge of a teacher model to a student model, that aims to improve the latter’s performance efficiently. Although generic knowledge distillation methods such as softmax representation distillation and intermediate feature matching have demonstrated improvements with various tasks, only marginal improvements are shown in student networks due to their limited model capacity. In this work, to address the student model’s limitation, we propose a novel flexible KD framework, Integrating Matched Features using Attentive Logit in Knowledge Distillation (IMF). Our approach introduces an intermediate feature distiller (IFD) to improve the overall performance of the student model by directly distilling the teacher’s knowledge into branches of student models. The generated output of IFD, which is trained by the teacher model, is effectively combined by attentive logit. We use only a few blocks of the student and the trained IFD during inference, requiring an equal or less number of parameters. Through extensive experiments, we demonstrate that IMF consistently outperforms other state-of-the-art methods with a large margin over the various datasets in different tasks without extra computation.
Computer Vision -> CV: Representation learning
List of keywords 
Computer Vision -> CV: Structural and model-based approaches, knowledge representation and reasoning Computer Vision -> CV: Representation learning
3273
Random Assignment of Indivisible Goods under Constraints
[+] More 
[-] Less 
We investigate the problem of random assignment of indivisible goods, in which each agent has an ordinal preference and a constraint. Our goal is to characterize the conditions under which there always exists a random assignment that simultaneously satisfies efficiency and envy-freeness. The probabilistic serial mechanism ensures the existence of such an assignment for the unconstrained setting. In this paper, we consider a more general setting in which each agent can consume a set of items only if the set satisfies her feasibility constraint. Such constraints must be taken into account in student course placements, employee shift assignments, and so on. We demonstrate that an efficient and envy-free assignment may not exist even for the simple case of partition matroid constraints, where the items are categorized, and each agent demands one item from each category. We then identify special cases in which an efficient and envy-free assignment always exists. For these cases, the probabilistic serial cannot be naturally extended; therefore, we provide mechanisms to find the desired assignment using various approaches.
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Mechanism design
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Game Theory and Economic Paradigms -> GTEP: Mechanism design
3281
Some General Identification Results for Linear Latent Hierarchical Causal Structure
[+] More 
[-] Less 
We study the problem of learning hierarchical causal structure among latent variables from measured variables. While some existing methods are able to recover the latent hierarchical causal structure, they mostly suffer from restricted assumptions, including the tree-structured graph constraint, no “triangle" structure, and non-Gaussian assumptions.  In this paper, we relax these restrictions above and consider a more general and challenging scenario where the beyond tree-structured graph, the “triangle" structure, and the arbitrary noise distribution are allowed. We investigate the identifiability of the latent hierarchical causal structure and show that by using second-order statistics, the latent hierarchical structure can be identified up to the Markov equivalence classes over latent variables. Moreover, some directions in the Markov equivalence classes of latent variables can be further identified using partially non-Gaussian data. Based on the theoretical results above, we design an effective algorithm for learning the latent hierarchical causal structure. The experimental results on synthetic data verify the effectiveness of the proposed method.
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
List of keywords 
Machine Learning -> ML: Causality Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
3294
MMPN: Multi-supervised Mask Protection Network for Pansharpening
[+] More 
[-] Less 
Pansharpening is to fuse a panchromatic (PAN) image with a multispectral (MS) image to obtain a high-spatial-resolution multispectral (HRMS) image. The deep learning-based pansharpening methods usually apply the convolution operation to extract features and only consider the similarity of gradient information between PAN and HRMS images, resulting in the problems of edge blur and spectral distortion in the fusion results. To solve this problem, a multi-supervised mask protection network (MMPN) is proposed to prevent spatial information from being damaged and overcome spectral distortion in the learning process. Firstly, by analyzing the relationships between high-resolution images and corresponding degraded images, a mask protection strategy (MPS) for edge protection is designed to guide the recovery of fused images. Then, based on the MPS, an MMPN containing four branches is constructed to generate the fusion and mask protection images. In MMPN, each branch employs a dual-stream multi-scale feature fusion module (DMFFM), which is built to extract and fuse the features of two input images. Finally, different loss terms are defined for the four branches, and combined into a joint loss function to realize network training. Experiments on simulated and real satellite datasets show that our method is superior to state-of-the-art methods both subjectively and objectively.
Computer Vision -> CV: Machine learning for vision
List of keywords 
Computer Vision -> CV: Applications Computer Vision -> CV: Machine learning for vision
3308
Competitive-Cooperative Multi-Agent Reinforcement Learning for Auction-based Federated Learning
[+] More 
[-] Less 
Auction-based Federated Learning (AFL) enables open collaboration among self-interested data consumers and data owners. Existing AFL approaches cannot manage the mutual influence among multiple data consumers competing to enlist data owners. Moreover, they cannot support a single data owner to join multiple data consumers simultaneously. To bridge these gaps, we propose the Multi-Agent Reinforcement Learning for AFL (MARL-AFL) approach to steer data consumers to bid strategically
towards an equilibrium with desirable overall system characteristics. We design a temperature-based reward reassignment scheme to make tradeoffs between cooperation and competition among AFL data consumers. In this way, it can reach an equilibrium state that ensures individual data consumers can achieve good utility, while preserving system-level social welfare. To circumvent potential collusion behaviors among data consumers, we introduce a bar agent to set a personalized bidding
lower bound for each data consumer. Extensive experiments on six commonly adopted benchmark datasets show that MARL-AFL is significantly more advantageous compared to six state-of-the-art approaches, outperforming the best by 12.2%, 1.9% and 3.4% in terms of social welfare, revenue and accuracy, respectively.
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Machine Learning -> ML: Reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
List of keywords 
Machine Learning -> ML: Federated learning AI Ethics, Trust, Fairness -> ETF: Trustworthy AI
Machine Learning -> ML: Reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
3314
iRe2f: Rethinking Effective Refinement in Language Structure Prediction via Efficient Iterative Retrospecting and Reasoning
[+] More 
[-] Less 
Refinement plays a critical role in language structure prediction, a process that deals with complex situations such as structural edge interdependencies. Since language structure prediction usually modeled as graph parsing, typical refinement methods involve taking an initial parsing graph as input and refining it using language input and other relevant information. Intuitively, a refinement component, i.e., refiner, should be lightweight and efficient, as it is only responsible for correcting faults in the initial graph. However, current refiners add a significant burden to the parsing process due to their reliance on time-consuming encoding-decoding procedure on the language input and graph. To make the refiner more practical for real-world applications, this paper proposes a lightweight but effective iterative refinement framework, iRe^2f, based on iterative retrospecting and reasoning without involving the re-encoding process on the graph. iRe^2f iteratively refine the parsing graph based on interaction between graph and sequence and efficiently learns the shortcut to update the sequence and graph representations in each iteration.  The shortcut is calculated based on the graph representation in the latest iteration.  iRe^2f reduces the number of refinement parameters by 90% compared to the previous smallest refiner. Experiments on a variety of language structure prediction tasks show that iRe^2f performs comparably or better than current state-of-the-art refiners, with a significant increase in efficiency.
Natural Language Processing -> NLP: Tagging, chunking, and parsing
List of keywords 
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP Natural Language Processing -> NLP: Tagging, chunking, and parsing
3319
Temporal Constrained Feasible Subspace Learning for Human Pose Forecasting
[+] More 
[-] Less 
Human pose forecasting is a sequential modeling task that aims to predict future poses from historical motions. Most existing approaches focus on the spatial-temporal neural network model design for learning movement patterns to reduce prediction errors. However, they usually do not strictly follow the temporal constraints in the inference stage. Even though a small Mean Per Joint Position Error (MPJPE) is achieved, some of the predicted poses are not temporal feasible solutions, which disobeys the continuity of the body movement. In this paper, we consider the temporal constrained feasible solutions for human pose forecasting, where the predicted poses of input historical poses are guaranteed to obey the temporal constraints strictly in the inference stage. Rather than direct supervision of the prediction in the original pose space, a temporal constrained subspace is explicitly learned and then followed by an inverse transformation to obtain the final predictions. We evaluate the proposed method on large-scale benchmarks, including Human3.6M, AMASS, and 3DPW. State-of-the-art performance has been achieved with the temporal constrained feasible solutions.
List of keywords 
Computer Vision -> CV: Biometrics, face, gesture and pose recognition 3349
Annealing Genetic-based Preposition Substitution for Text Rubbish Example Generation
[+] More 
[-] Less 
Modern Natural Language Processing (NLP) models expose under-sensitivity towards text rubbish examples. The text rubbish example is the heavily modified input text which is nonsensical to humans but does not change the model’s prediction. Prior work crafts rubbish examples by iteratively deleting words and determining the deletion order with beam search. However, the produced rubbish examples usually cause a reduction in model confidence and sometimes deliver human-readable text. To address these problems, we propose an Annealing Genetic based Preposition Substitution (AGPS) algorithm for text rubbish sample generation with two major merits. Firstly, the AGPS crafts rubbish text examples by substituting input words with meaningless prepositions instead of directly removing them, which brings less degradation to the model’s confidence. Secondly, we design an Annealing Genetic algorithm to optimize the word replacement priority, which allows the Genetic Algorithm (GA) to jump out the local optima with probabilities. This is significant in achieving better objectives, i.e., a high word modification rate and a high model confidence. Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences.
Natural Language Processing -> NLP: Text classification
List of keywords 
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP Natural Language Processing -> NLP: Text classification
3352
Detecting Adversarial Faces Using Only Real Face Self-Perturbations
[+] More 
[-] Less 
Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniques achieve high accuracy in detecting some specific adversarial faces (adv-faces), new attack methods especially GAN-based attacks with completely different noise patterns circumvent them and reach a higher attack success rate. Even worse, existing techniques require attack data before implementing the defense, making it impractical to defend newly emerging attacks that are unseen to defenders. In this paper, we investigate the intrinsic generality of adv-faces and propose to generate pseudo adv-faces by perturbing real faces with three heuristically designed noise patterns. We are the first to train an adv-face detector using only real faces and their self-perturbations, agnostic to victim facial recognition systems, and agnostic to unseen attacks. By regarding adv-faces as out-of-distribution data, we then naturally introduce a novel cascaded system for adv-face detection, which consists of training data self-perturbations, decision boundary regularization, and a max-pooling-based binary classifier focusing on abnormal local color aberrations. Experiments conducted on LFW and CelebA-HQ datasets with eight gradient-based and two GAN-based attacks validate that our method generalizes to a variety of unseen adversarial attacks.
Computer Vision -> CV: Applications
Computer Vision -> CV: Biometrics, face, gesture and pose recognition
List of keywords 
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods Computer Vision -> CV: Applications
Computer Vision -> CV: Biometrics, face, gesture and pose recognition
3357
Spatially Constrained Adversarial Attack Detection and Localization in the Representation Space of Optical Flow Networks
[+] More 
[-] Less 
Optical flow estimation have shown significant improvements with advances in deep neural networks. However, these flow networks have recently been shown to be vulnerable to patch-based adversarial attacks, which poses security risks in real-world applications, such as self-driving cars and robotics. We propose SADL, a Spatially constrained adversarial Attack Detection and Localization framework, to detect and localize these patch-based attack without requiring a dedicated training. The detection of an attacked input sequence is performed via iterative optimization on the features from the inner layers of flow networks, without any prior knowledge of the attacks. The novel spatially constrained optimization ensures that the detected anomalous subset of features comes from a local region. To this end, SADL provides a subset of nodes within a spatial neighborhood that contribute more to the detection, which will be utilized to localize the attack in the input sequence. The proposed SADL is validated across multiple datasets and flow networks. With patch attacks 4.8% of the size of the input image resolution on RAFT, our method successfully detects and localizes them with an average precision of 0.946 and 0.951 for KITTI-2015 and MPI-Sintel datasets, respectively. The results show that SADL consistently achieves higher detection rates than existing methods and provides new localization capabilities.
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
List of keywords 
Computer Vision -> CV: Motion and tracking Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods
3364
Fast Algorithms for SAT with Bounded Occurrences of Variables
[+] More 
[-] Less 
We present fast algorithms for the general CNF satisfiability problem (SAT) with running-time bound O*({c_d}^n), where c_d is a function of the maximum occurrence d of variables (d can also be the average occurrence when each variable appears at least twice), and n is the number of variables in the input formula. Similar to SAT with bounded clause lengths, SAT with bounded occurrences of variables has also been extensively studied in the literature. Especially, the running-time bounds for small values of d, such as d=3 and d=4, have become bottlenecks for algorithms evaluated by the formula length L and other algorithms. In this paper, we show that SAT can be solved in time O*(1.1238^n) for d=3 and O*(1.2628^n) for d=4, improving the previous results O*(1.1279^n) and O*(1.2721^n) obtained by Wahlström (SAT 2005) nearly 20 years ago. For d>=5, we obtain a running time bound of O*(1.0641^{dn}), implying a bound of O*(1.0641^L) with respect to the formula length L, which is also a slight improvement over the previous bound.
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty 3365
Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents?
[+] More 
[-] Less 
Reinforcement Learning’s (RL) ubiquity has instigated research on potential threats to its training and deployment. Many works study single-learner training-time attacks that "pre-programme" behavioral triggers into a strategy. However, attacks on collections of learning agents remain largely overlooked. We remedy the situation by developing a constructive training-time attack on a population of learning agents and additionally make the attack agnostic to the population’s size. The attack constitutes a sequence of environment (re)parameterizations (poisonings), generated to overcome individual differences between agents and lead the entire population to the same target behavior while minimizing effective environment modulation. Our method is demonstrated on populations of independent learners in "ghost" environments (learners do not interact or perceive each other) as well as environments with mutual awareness, with or without individual learning. From the attack perspective, we pursue an ultra-blackbox setting, i.e., the attacker’s training utilizes only across-policy traces of the victim learners for both attack conditioning and evaluation. The resulting uncertainty in population behavior is managed via a novel Wasserstein distance-based Gaussian embedding of behaviors detected within the victim population. To align with prior works on environment poisoning, our experiments are based on a 3D Grid World domain and show:  a) feasibility, i.e., despite the uncertainty, the attack forces a population-wide adoption of target behavior; b) efficacy, i.e., the attack is size-agnostic and transferable. Code and Appendices are available at "bit.ly/github-rb-cep".
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Deep reinforcement learning
List of keywords 
Machine Learning -> ML: Adversarial machine learning Agent-based and Multi-agent Systems -> MAS: Multi-agent learning
Machine Learning -> ML: Deep reinforcement learning
3369
Hierarchical Prompt Learning for Compositional Zero-Shot Recognition
[+] More 
[-] Less 
Compositional Zero-Shot Learning (CZSL) aims to imitate the powerful generalization ability of human beings to recognize novel compositions of known primitive concepts that correspond to a state and an object, e.g., purple apple. To fully capture the intra- and inter-class correlations between compositional concepts, in this paper, we propose to learn them in a hierarchical manner. Specifically, we set up three hierarchical embedding spaces that respectively model the states, the objects, and their compositions, which serve as three “experts” that can be combined in inference for more accurate predictions. We achieve this based on the recent success of large-scale pretrained vision-language models, e.g., CLIP, which provides a strong initial knowledge of image-text relationships. To better adapt this knowledge to CZSL, we propose to learn three hierarchical prompts by explicitly fixing the unrelated word tokens in the three embedding spaces. Despite its simplicity, our proposed method consistently yields superior performance over current state-of-the-art approaches on three widely-used CZSL benchmarks.
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
3378
The Effects of AI Biases and Explanations on Human Decision Fairness: A Case Study of Bidding in Rental Housing Markets
[+] More 
[-] Less 
The use of AI-based decision aids in diverse domains has inspired many empirical investigations into how AI models’ decision recommendations impact humans’ decision accuracy in AI-assisted decision making, while explorations on the impacts on humans’ decision fairness are largely lacking despite their clear importance. In this paper, using a real-world business decision making scenario—bidding in rental housing markets—as our testbed, we present an experimental study on understanding how the bias level of the AI-based decision aid as well as the provision of AI explanations affect the fairness level of humans’ decisions, both during and after their usage of the decision aid. Our results suggest that when people are assisted by an AI-based decision aid, both the higher level of racial biases the decision aid exhibits and surprisingly, the presence of AI explanations, result in more unfair human decisions across racial groups. Moreover, these impacts are partly made through triggering humans’ “disparate interactions” with AI. However, regardless of the AI bias level and the presence of AI explanations, when people return to make independent decisions after their usage of the AI-based decision aid, their decisions no longer exhibit significant unfairness across racial groups.
Humans and AI -> HAI: Human-computer interaction
List of keywords 
Humans and AI -> HAI: Human-AI collaboration Humans and AI -> HAI: Human-computer interaction
3386
Statistically Significant Concept-based Explanation of Image Classifiers via Model Knockoffs
[+] More 
[-] Less 
A concept-based classifier can explain the decision process of a deep learning model by human understandable concepts in image classification problems. However, sometimes concept-based explanations may cause false positives, which misregards unrelated concepts as important for the prediction task. Our goal is to find the statistically significant concept for classification to prevent misinterpretation. In this study, we propose a method using a deep learning model to learn the image concept and then using the knockoff sample to select the important concepts for prediction by controlling the False Discovery Rate (FDR) under a certain value. We evaluate the proposed method in our experiments on both synthetic and real data. Also, it shows that our method can control the FDR properly while selecting highly interpretable concepts to improve the trustworthiness of the model.
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Explainable/Interpretable machine learning
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Trustworthy AI AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Explainable/Interpretable machine learning
3395
OptIForest: Optimal Isolation Forest for Anomaly Detection
[+] More 
[-] Less 
Anomaly detection plays an increasingly important role in various fields for critical tasks such as intrusion detection in cybersecurity, financial risk detection, and human health monitoring. A variety of anomaly detection methods have been proposed, and a category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency, e.g., iForest is often employed as a state-of-the-art detector for real deployment. While the majority of isolation forests use the binary structure, a framework LSHiForest has demonstrated that the multi-fork isolation tree structure can lead to better detection performance. However, there is no theoretical work answering the fundamentally and practically important question on the optimal tree structure for an isolation forest with respect to the branching factor. In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree. Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash which enables more information to be learned from data for better isolation quality. The rationale of our approach relies on a better bias-variance trade-off achieved by bias reduction in OptIForest. Extensive experiments on a series of benchmarking datasets for comparative and ablation studies demonstrate that our approach can efficiently and robustly achieve better detection performance in general than the state-of-the-arts including the deep learning based methods.
Machine Learning -> ML: Ensemble methods
List of keywords 
Data Mining -> DM: Anomaly/outlier detection Machine Learning -> ML: Ensemble methods
3398
Action Space Reduction for Planning Domains
[+] More 
[-] Less 
Planning tasks succinctly represent labeled transition systems, with each ground action corresponding to a label. This granularity, however, is not necessary for solving planning tasks and can be harmful, especially for model-free methods. In order to apply such methods, the label sets are often manually reduced. In this work, we propose automating this manual process. We characterize a valid label reduction for classical planning tasks and propose an automated way of obtaining such valid reductions by leveraging lifted mutex groups. Our experiments show a significant reduction in the action label space size across a wide collection of planning domains. We demonstrate the benefit of our automated label reduction in two separate use cases: improved sample complexity of model-free reinforcement learning algorithms and speeding up successor generation in lifted planning. The code and supplementary material are available at https://github.com/IBM/Parameter-Seed-Set.
List of keywords 
Planning and Scheduling -> PS: Theoretical foundations of planning 3414
Dual Personalization on Federated Recommendation
[+] More 
[-] Less 
Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of recommender systems in federated settings. The code is available.
Data Mining -> DM: Privacy-preserving data mining
Data Mining -> DM: Recommender systems
List of keywords 
Machine Learning -> ML: Federated learning Data Mining -> DM: Privacy-preserving data mining
Data Mining -> DM: Recommender systems
3422
Simulation-Assisted Optimization for Large-Scale Evacuation Planning with Congestion-Dependent Delays
[+] More 
[-] Less 
Evacuation planning is a crucial part of disaster management. However, joint optimization of its two essential components, routing and scheduling, with objectives such as minimizing average evacuation time or evacuation completion time, is a computationally hard problem. To approach it, we present MIP-LNS, a scalable optimization method that utilizes heuristic search with mathematical optimization and can optimize a variety of objective functions. We also present the method MIP-LNS-SIM, where we combine agent-based simulation with MIP-LNS to estimate delays due to congestion, as well as, find optimized plans considering such delays. We use Harris County in Houston, Texas, as our study area. We show that, within a given time limit, MIP-LNS finds better solutions than existing methods in terms of three different metrics. However, when congestion dependent delay is considered, MIP-LNS-SIM outperforms MIP-LNS in multiple performance metrics. In addition, MIP-LNS-SIM has a significantly lower percent error in estimated evacuation completion time compared to MIP-LNS.
Agent-based and Multi-agent Systems -> MAS: Applications
Search -> S: Heuristic search
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
List of keywords 
Planning and Scheduling -> PS: Search in planning and scheduling Agent-based and Multi-agent Systems -> MAS: Applications
Search -> S: Heuristic search
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
3423
New Algorithms for the Fair and Efficient Allocation of Indivisible Chores
[+] More 
[-] Less 
We study the problem of fairly and efficiently allocating indivisible chores among agents with additive disutility functions. We consider the widely used envy-based fairness properties of EF1 and EFX in conjunction with the efficiency property of fractional Pareto-optimality (fPO). Existence (and computation) of an allocation that is simultaneously EF1/EFX and fPO are challenging open problems, and we make progress on both of them. We show the existence of an allocation that is
– EF1 + fPO, when there are three agents,
– EF1 + fPO, when there are at most two disutility functions,
– EFX + fPO, for three agents with bivalued disutility functions.
These results are constructive, based on strongly polynomial-time algorithms. We also investigate non-existence and show that an allocation that is EFX+fPO need not exist, even for two agents.
Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Computational social choice
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division Agent-based and Multi-agent Systems -> MAS: Resource allocation
Game Theory and Economic Paradigms -> GTEP: Computational social choice
3434
pTSE: A Multi-model Ensemble Method for Probabilistic Time Series Forecasting
[+] More 
[-] Less 
Various probabilistic time series forecasting models have sprung up and shown remarkably good performance. However, the choice of model highly relies on the characteristics of the input time series and the fixed distribution that model is based on. Due to the fact that the probability distributions cannot be averaged over different models straightforwardly, the current time series model ensemble methods cannot be directly applied to improve the robustness and accuracy of forecasting. To address this issue, we propose pTSE, a multi-model distribution ensemble method for probabilistic forecasting based on Hidden Markov Model (HMM). pTSE only takes off-the-shelf outputs from member models without requiring further information about each model. Besides, we provide a complete theoretical analysis of pTSE to prove that the empirical distribution of time series subject to an HMM will converge to the stationary distribution almost surely. Experiments on benchmarks show the superiority of pTSE over all member models and competitive ensemble methods.
Machine Learning -> ML: Time series and data streams
List of keywords 
Machine Learning -> ML: Probabilistic machine learning Machine Learning -> ML: Time series and data streams
3440
DiSProD: Differentiable Symbolic Propagation of Distributions for Planning
[+] More 
[-] Less 
The paper introduces DiSProD, an online planner developed for
environments with probabilistic transitions in continuous state and
action spaces. DiSProD builds a symbolic graph that captures the
distribution of future trajectories, conditioned on a given policy,
using independence assumptions and approximate propagation of
distributions. The symbolic graph provides a differentiable
representation of the policy’s value, enabling efficient gradient-based
optimization for long-horizon search. The propagation of approximate
distributions can be seen as an aggregation of many trajectories, making
it well-suited for dealing with sparse rewards and stochastic
environments. An extensive experimental evaluation compares DiSProD to
state-of-the-art planners in discrete-time planning and real-time
control of robotic systems. The proposed method improves over existing
planners in handling stochastic environments, sensitivity to search
depth, sparsity of rewards, and large action spaces. Additional
real-world experiments demonstrate that DiSProD can control ground
vehicles and surface vessels to successfully navigate around obstacles.
Planning and Scheduling -> PS: Planning algorithms
Planning and Scheduling -> PS: Robot planning
List of keywords 
Planning and Scheduling -> PS: Planning under uncertainty Planning and Scheduling -> PS: Planning algorithms
Planning and Scheduling -> PS: Robot planning
3449
K∗ Search over Orbit Space for Top-k Planning
[+] More 
[-] Less 
Top-k planning, the task of finding k top-cost plans, is a key formalism for many planning applications and K* search is a well-established approach to top-k planning.  The algorithm iteratively runs A* search and Eppstein’s algorithm until a sufficient number of plans is found.  The performance of K* algorithm is therefore inherently limited by the performance of A*, and in order to improve K* performance, that of A* must be improved.  In cost-optimal planning, orbit space search improves A* performance by exploiting symmetry pruning, essentially performing A* in the orbit space instead of state space.  In this work, we take a similar approach to top-k planning.  We show theoretical equivalence between the goal paths in the state space and in the orbit space, allowing to perform K* search in the orbit space instead, reconstructing plans from the found paths in the orbit space.  We prove that our algorithm is sound and complete for top-k planning and empirically show it to achieve state-of-the-art performance, overtaking all existing to date top-k planners.  The code is available at https://github.com/IBM/kstar.
Planning and Scheduling -> PS: Planning algorithms
Planning and Scheduling -> PS: Theoretical foundations of planning
List of keywords 
Planning and Scheduling -> PS: Search in planning and scheduling Planning and Scheduling -> PS: Planning algorithms
Planning and Scheduling -> PS: Theoretical foundations of planning
3475
Do We Need an Encoder-Decoder to Model Dynamical Systems on Networks?
[+] More 
[-] Less 
As deep learning gains popularity in modelling dynamical systems, we expose an underappreciated misunderstanding relevant to modelling dynamics on networks. Strongly influenced by graph neural networks, latent vertex embeddings are naturally adopted in many neural dynamical network models. However, we show that embeddings tend to induce a model that fits observations well but simultaneously has incorrect dynamical behaviours. Recognising that previous studies narrowly focus on short-term predictions during the transient phase of a flow, we propose three tests for correct long-term behaviour, and illustrate how an embedding-based dynamical model fails these tests, and analyse the causes, particularly through the lens of topological conjugacy. In doing so, we show that the difficulties can be avoided by not using embedding. We propose a simple embedding-free alternative based on parametrising two additive vector-field components. Through extensive experiments, we verify that the proposed model can reliably recover a broad class of dynamics on different network topologies from time series data.
Machine Learning -> ML: Time series and data streams
List of keywords 
Data Mining -> DM: Networks Machine Learning -> ML: Time series and data streams
3478
RaMLP: Vision MLP via Region-aware Mixing
[+] More 
[-] Less 
Recently, MLP-based architectures achieved impressive results in image classification against CNNs and ViTs. However, there is an obvious limitation in that their parameters are related to image sizes, allowing them to process only fixed image sizes. Therefore, they cannot directly adapt dense prediction tasks (e.g., object detection and semantic segmentation) where images are of various sizes. Recent methods tried to address it but brought two new problems, long-range dependencies or important visual cues are ignored. This paper presents a new MLP-based architecture, Region-aware MLP (RaMLP), to satisfy various vision tasks and address the above three problems. In particular, we propose a well-designed module, Region-aware Mixing (RaM). RaM captures important local information and further aggregates these important visual clues. Based on RaM, RaMLP achieves a global receptive field even in one block. It is worth noting that, unlike most existing MLP-based architectures that adopt the same spatial weights to all samples, RaM is region-aware and adaptively determines weights to extract region-level features better. Impressively, our RaMLP outperforms state-of-the-art ViTs, CNNs, and MLPs on both ImageNet-1K image classification and downstream dense prediction tasks, including MS-COCO object detection, MS-COCO instance segmentation, and ADE20K semantic segmentation. In particular, RaMLP outperforms MLPs by a large margin (around 1.5% Apb or 1.0% mIoU) on dense prediction tasks. The training code could be found at https://github.com/xiaolai-sqlai/RaMLP.
Computer Vision -> CV: Representation learning
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Representation learning
3482
Social Motivation for Modelling Other Agents under Partial Observability in Decentralised Training
[+] More 
[-] Less 
Understanding other agents is a key challenge in constructing artificial social agents. Current works focus on centralised training, wherein agents are allowed to know all the information about others and the environmental state during training. In contrast, this work studies decentralised training, wherein agents must learn the model of other agents in order to cooperate with them under partially-observable conditions, even during training, i.e. learning agents are myopic. The intrinsic motivation for artificial agents is modelled on the concept of human social motivation that entices humans to meet and understand each other, especially when experiencing a utility loss. Our intrinsic motivation encourages agents to stay near each other to obtain better observations and construct a model of others. They do so when their model of other agents is poor, or the overall task performance is bad during the learning phase. This simple but effective method facilitates the processes of modelling others, resulting in an improvement of the performance in cooperative tasks significantly. Our experiments demonstrate that the socially-motivated agent can model others better and promote cooperation across different tasks.
Agent-based and Multi-agent Systems -> MAS: Other
List of keywords 
Machine Learning -> ML: Deep reinforcement learning Agent-based and Multi-agent Systems -> MAS: Other
3497
JEPOO: Highly Accurate Joint Estimation of Pitch, Onset and Offset for Music Information Retrieval
[+] More 
[-] Less 
Melody extraction is a core task in music information retrieval, and the estimation of pitch, onset and offset are key sub-tasks in melody extraction. Existing methods have limited accuracy, and work for only one type of data, either single-pitch or multi-pitch. In this paper, we propose a highly accurate method for joint estimation of pitch, onset and offset, named JEPOO. We address the challenges of joint learning optimization and handling both single-pitch and multi-pitch data through novel model design and a new optimization technique named Pareto modulated loss with loss weight regularization. This is the first method that can accurately handle both single-pitch and multi-pitch music data, and even a mix of them. A comprehensive experimental study on a wide range of real datasets shows that JEPOO outperforms state-of-the-art methods by up to 10.6\%, 8.3\% and 10.3\% for the prediction of Pitch, Onset and Offset, respectively, and JEPOO is robust for various types of data and instruments. The ablation study validates the effectiveness of each component of JEPOO.
Multidisciplinary Topics and Applications -> MDA: Entertainment
Multidisciplinary Topics and Applications -> MDA: Other
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Arts and creativity Multidisciplinary Topics and Applications -> MDA: Entertainment
Multidisciplinary Topics and Applications -> MDA: Other
3510
Boosting Few-Shot Open-Set Recognition with Multi-Relation Margin Loss
[+] More 
[-] Less 
Few-shot open-set recognition (FSOSR) has become a great challenge, which requires classifying known classes and rejecting the unknown ones with only limited samples. Existing FSOSR methods mainly construct an ambiguous distribution of known classes from scarce known samples without considering the latent distribution information of unknowns, which degrades the performance of open-set recognition. To address this issue, we propose a novel loss function called multi-relation margin (MRM) loss that can plug in few-shot methods to boost the performance of FSOSR. MRM enlarges the margin between different classes by extracting the multi-relationship of paired samples to dynamically refine the decision boundary for known classes and implicitly delineate the distribution of unknowns. Specifically, MRM separates the classes by enforcing a margin while concentrating samples of the same class on a hypersphere with a learnable radius. In order to better capture the distribution information of each class, MRM extracts the similarity and correlations among paired samples, ameliorating the optimization of the margin and radius. Experiments on public benchmarks reveal that methods with MRM loss can improve the unknown detection of AUROC by a significant margin while correctly classifying the known classes.
Machine Learning -> ML: Few-shot learning
List of keywords 
Machine Learning -> ML: Meta-learning Machine Learning -> ML: Few-shot learning
3525
SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference
[+] More 
[-] Less 
Dynamic early exiting has been proven to improve the inference speed of the pre-trained language model like BERT. However, all samples must go through all consecutive layers before early exiting and more complex samples usually go through more layers, which still exists redundant computation. In this paper, we propose a novel dynamic early exiting combined with layer skipping for BERT inference named SmartBERT, which adds a skipping gate and an exiting operator into each layer of BERT. SmartBERT can adaptively skip some layers and adaptively choose whether to exit. Besides, we propose cross-layer contrastive learning and combine it into our training phases to boost the intermediate layers and classifiers which would be beneficial for early exiting. To keep the inconsistent usage of skipping gates between training and inference phases, we propose a hard weight mechanism during training phase. We conduct experiments on eight classification datasets of the GLUE benchmark. Experimental results show that SmartBERT achieves 2-3× computation reduction with minimal accuracy drops compared with BERT and our method outperforms previous methods in both efficiency and accuracy. Moreover, in some complex datasets, we prove that the early exiting based on entropy hardly works, and the skipping mechanism is essential for reducing computation.
Natural Language Processing -> NLP: Text classification
List of keywords 
Natural Language Processing -> NLP: Language models Natural Language Processing -> NLP: Text classification
3526
Deep Hashing-based Dynamic Stock Correlation Estimation via Normalizing Flow
[+] More 
[-] Less 
In financial scenarios, influenced by common factors such as global macroeconomic and sector-specific factors, stocks exhibit varying degrees of correlations with each other, which is essential in risk-averse portfolio allocation. Because the real risk matrix is unobservable, the covariance-based correlation matrix is widely used for constructing diversified stock portfolios. However, studies have seldom focused on dynamic correlation matrix estimation under the non-stationary financial market. Moreover, as the number of stocks in the market grows, existing correlation matrix estimation methods face more serious challenges with regard to efficiency and effectiveness. In this paper, we propose a novel hash-based dynamic correlation forecasting model (HDCF) to estimate dynamic stock correlations. Under structural assumptions on the correlation matrix, HDCF learns the hash representation based on normalizing flows instead of the real-valued representation, which performs extremely efficiently in high-dimensional settings. Experiments show that our proposed model outperforms baselines on portfolio decisions in terms of effectiveness and efficiency.
Machine Learning -> ML: Representation learning
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Finance Machine Learning -> ML: Representation learning
3531
Guide to Control: Offline Hierarchical Reinforcement Learning Using Subgoal Generation for Long-Horizon and Sparse-Reward Tasks
[+] More 
[-] Less 
Reinforcement learning (RL) has achieved considerable success in many fields, but applying it to real-world problems can be costly and risky because it requires a lot of online interaction. Recently, offline RL has shown the possibility of extracting a solution through existing logged data without online interaction. In this work, we propose an offline hierarchical RL method, Guider (Guide to Control), that can efficiently solve long-horizon and sparse-reward tasks from offline data. The high-level policy sequentially generates a subgoal that can guide the agent to arrive at the final goal, and the lower-level policy learns how to reach each given guided subgoal. In the process of learning from offline data, the key is to make the low-level policy reachable to the generated subgoals. We show that high-quality subgoal generation is possible through pre-training a latent subgoal prior model. The well-regulated subgoal generation improves performance while avoiding distributional shifts in offline RL by breaking down long, complex tasks into shorter, easier ones. For evaluations, Guider outperforms prior offline RL methods in long-horizon robot navigation and complex manipulation benchmarks. Our code is available at https://github.com/gckor/Guider.
Planning and Scheduling -> PS: Learning in planning and scheduling
Robotics -> ROB: Behavior and control
List of keywords 
Machine Learning -> ML: Deep reinforcement learning Planning and Scheduling -> PS: Learning in planning and scheduling
Robotics -> ROB: Behavior and control
3535
Distilling Universal and Joint Knowledge for Cross-Domain Model Compression on Time Series Data
[+] More 
[-] Less 
For many real-world time series tasks, the computational complexity of prevalent deep leaning models often hinders the deployment on resource limited environments (e.g., smartphones). Moreover, due to the inevitable domain shift between model training (source) and deploying (target) stages, compressing those deep models under cross-domain scenarios becomes more challenging. Although some of existing works have already explored cross-domain knowledge distillation for model compression, they are either biased to source data or heavily tangled between source and target data. To this end, we design a novel end-to-end framework called UNiversal and joInt Knowledge Distillation (UNI-KD) for cross-domain model compression. In particular, we propose to transfer both the universal feature-level knowledge across source and target domains and the joint logit-level knowledge shared by both domains from the teacher to the student model via an adversarial learning scheme. More specifically, a feature-domain discriminator is employed to align teacher’s and student’s representations for universal knowledge transfer. A data-domain discriminator is utilized to prioritize the domain-shared samples for joint knowledge transfer. Extensive experimental results on four time series datasets demonstrate the superiority of our proposed method over state-of-the-art (SOTA) benchmarks. The source code is available at https://github.com/ijcai2023/UNI KD.
Machine Learning -> ML: Time series and data streams
Machine Learning -> ML: Unsupervised learning
List of keywords 
Machine Learning -> ML: Multi-task and transfer learning Machine Learning -> ML: Time series and data streams
Machine Learning -> ML: Unsupervised learning
3540
FedNoRo: Towards Noise-Robust Federated Learning by Addressing Class Imbalance and Label Noise Heterogeneity
[+] More 
[-] Less 
Federated noisy label learning (FNLL) is emerging as a promising tool for privacy-preserving multi-source decentralized learning. Existing research, relying on the assumption of class-balanced global data, might be incapable to model complicated label noise, especially in medical scenarios. In this paper, we first formulate a new and more realistic federated label noise problem where global data is class-imbalanced and label noise is heterogeneous, and then propose a two-stage framework named FedNoRo for noise-robust federated learning. Specifically, in the first stage of FedNoRo, per-class loss indicators followed by Gaussian Mixture Model are deployed for noisy client identification. In the second stage, knowledge distillation and a distance-aware aggregation function are jointly adopted for noise-robust federated model updating. Experimental results on the widely-used ICH and ISIC2019 datasets demonstrate the superiority of FedNoRo against the state-of-the-art FNLL methods for addressing class imbalance and label noise heterogeneity in real-world FL scenarios.
Machine Learning -> ML: Classification
Machine Learning -> ML: Robustness
List of keywords 
Machine Learning -> ML: Federated learning Machine Learning -> ML: Classification
Machine Learning -> ML: Robustness
3545
Graph Neural Convection-Diffusion with Heterophily
[+] More 
[-] Less 
Graph neural networks (GNNs) have shown promising results across various graph learning tasks, but they often assume homophily, which can result in poor performance on heterophilic graphs. The connected nodes are likely to be from different classes or have dissimilar features on heterophilic graphs. In this paper, we propose a novel GNN that incorporates the principle of heterophily by modeling the flow of information on nodes using the convection-diffusion equation (CDE). This allows the CDE to take into account both the diffusion of information due to homophily and the “convection” of information due to heterophily. We conduct extensive experiments, which suggest that our framework can achieve competitive performance on node classification tasks for heterophilic graphs, compared to the state-of-the-art methods. The code is available at https://github.com/zknus/Graph-Diffusion-CDE.
Machine Learning -> ML: Classification
List of keywords 
Machine Learning -> ML: Sequence and graph learning Machine Learning -> ML: Classification
3548
Analyzing and Combating Attribute Bias for Face Restoration
[+] More 
[-] Less 
Face restoration (FR) recovers high resolution (HR) faces from low resolution (LR) faces and is challenging due to its ill-posed nature. With years of development, existing methods can produce quality HR faces with realistic details. However, we observe that key facial attributes (e.g., age and gender) of the restored faces could be dramatically different from the LR faces and call this phenomenon attribute bias, which is fatal when using FR for applications such as surveillance and security. Thus, we argue that FR should consider not only image quality as in existing works but also attribute bias. To this end, we thoroughly analyze attribute bias with extensive experiments and find that two major causes are the lack of attribute information in LR faces and bias in the training data. Moreover, we propose the DebiasFR framework to produce HR faces with high image quality and accurate facial attributes. The key design is to explicitly model the facial attributes, which also allows to adjust facial attributes for the output HR faces. Experiment results show that DebiasFR has comparable image quality but significantly smaller attribute bias when compared with state-of-the-art FR methods.
Computer Vision -> CV: Bias, fairness and privacy
Computer Vision -> CV: Neural generative models, auto encoders, GANs
List of keywords 
Computer Vision -> CV: Applications Computer Vision -> CV: Bias, fairness and privacy
Computer Vision -> CV: Neural generative models, auto encoders, GANs
3556
Not Only Pairwise Relationships: Fine-Grained Relational Modeling for Multivariate Time Series Forecasting
[+] More 
[-] Less 
Recent graph-based methods achieve significant success in multivariate time series modeling and forecasting due to their ability to handle relationships among time series variables. However, only pairwise relationships are considered in most existing works. They ignore beyond-pairwise relationships and their potential categories in practical scenarios, which leads to incomprehensive relationship learning for multivariate time series forecasting. In this paper, we present ReMo, a Relational Modeling-based method, to promote fine-grained relational learning among multivariate time series data. Firstly, by treating time series variables and complex relationships as nodes and hyperedges, we extract multi-view hypergraphs from data to capture beyond-pairwise relationships. Secondly, a novel hypergraph message passing strategy is designed to characterize both nodes and hyperedges by inferring the potential categories of relationships and further distinguishing their impacts on time series variables. By integrating these two modules into the time series forecasting framework, ReMo effectively improves the performance of multivariate time series forecasting. The experimental results on seven commonly used datasets from different domains demonstrate the superiority of our model.
Data Mining -> DM: Mining graphs
Data Mining -> DM: Mining spatial and/or temporal data
List of keywords 
Machine Learning -> ML: Time series and data streams Data Mining -> DM: Mining graphs
Data Mining -> DM: Mining spatial and/or temporal data
3562
Acoustic NLOS Imaging with Cross Modal Knowledge Distillation
[+] More 
[-] Less 
Acoustic non-line-of-sight (NLOS) imaging aims to reconstruct hidden scenes by analyzing reflections of acoustic waves. Despite recent developments in the field, existing methods still have limitations such as sensitivity to noise in a physical model and difficulty in reconstructing unseen objects in a deep learning model. To address these limitations, we propose a novel cross-modal knowledge distillation (CMKD) approach for acoustic NLOS imaging. Our method transfers knowledge from a well-trained image network to an audio network, effectively combining the strengths of both modalities. As a result, it is robust to noise and superior in reconstructing unseen objects. Additionally, we evaluate real-world datasets and demonstrate that the proposed method outperforms state-of-the-art methods in acoustic NLOS imaging. The experimental results indicate that CMKD is an effective solution for addressing the limitations of current acoustic NLOS imaging methods. Our code, model, and data are available at https://github.com/shineh96/Acoustic-NLOS-CMKD.
Computer Vision -> CV: Neural generative models, auto encoders, GANs
Machine Learning -> ML: Multi-modal learning
List of keywords 
Computer Vision -> CV: Applications Computer Vision -> CV: Neural generative models, auto encoders, GANs
Machine Learning -> ML: Multi-modal learning
3566
GLPocket: A Multi-Scale Representation Learning Approach for Protein Binding Site Prediction
[+] More 
[-] Less 
Protein binding site prediction is an important prerequisite for the discovery of new drugs. Usually, natural 3D U-Net is adopted as the standard site prediction framework to do per-voxel binary mask classification. However, this scheme only performs feature extraction for single-scale samples, which may bring the loss of global or local information, resulting in incomplete, artifacted or even missed predictions. To tackle this issue, we propose a network called GLPocket, which is based on the Lmser (Least mean square error reconstruction) network and utilizes multi-scale representation to predict binding sites. Firstly, GLPocket uses Target Cropping Block (TCB) for targeted prediction. TCB selects the local interested feature from the global representations to perform concentrated prediction, and reduces the volume of feature maps to be calculated by 82% without adding additional parameters. It integrates global distribution information into local regions, making prediction more concentrated on decoding stage. Secondly, GLPocket establishes long-range relationship of patches within the local region with Transformer Block (TB), to enrich local context semantic information. Experiments show that GLPocket improves by 0.5%-4% on DCA Top-n prediction compared with previous state-of-the-art methods on four datasets. Our code has been released in https://github.com/CMACH508/GLPocket.
Computer Vision -> CV: Biomedical image analysis
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Bioinformatics Computer Vision -> CV: Biomedical image analysis
3572
DeepPSL: End-to-End Perception and Reasoning
[+] More 
[-] Less 
We introduce DeepPSL a variant of probabilistic soft logic (PSL) to produce an end-to-end trainable system that integrates reasoning and perception. PSL represents first-order logic in terms of a convex graphical model – hinge-loss Markov random fields (HL-MRFs). PSL stands out among probabilistic logic frameworks due to its tractability having been applied to systems of more than 1 billion ground rules. The key to our approach is to represent predicates in first-order logic using deep neural networks and then to approximately back-propagate through the HL-MRF and thus train every aspect of the first-order system being represented. We believe that this approach represents an interesting direction for the integration of deep learning and reasoning techniques with applications to knowledge base learning, multi-task learning, and explainability. Evaluation on three different tasks demonstrates that DeepPSL significantly outperforms state-of-the-art neuro-symbolic methods on scalability while achieving comparable or better accuracy.
Machine Learning -> ML: Knowledge-aided learning
Machine Learning -> ML: Learning graphical models
List of keywords 
Machine Learning -> ML: Neuro-symbolic methods Machine Learning -> ML: Knowledge-aided learning
Machine Learning -> ML: Learning graphical models
3573
Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance
[+] More 
[-] Less 
Gradient Boosting Decision Tree (GBDT) has achieved remarkable success in a wide variety of applications. The split finding algorithm, which determines the tree construction process, is one of the most crucial components of GBDT. However, the split finding algorithm has long been criticized for its bias towards features with a large number of potential splits. This bias introduces severe interpretability and overfitting issues in GBDT. To this end, we provide a fine-grained analysis of bias in GBDT and demonstrate that the bias originates from 1) the systematic bias in the gain estimation of each split and 2) the bias in the split finding algorithm resulting from the use of the same data to evaluate the split improvement and determine the best split. Based on the analysis, we propose unbiased gain, a new unbiased measurement of gain importance using out-of-bag samples. Moreover, we incorporate the unbiased property into the split finding algorithm and develop UnbiasedGBM to solve the overfitting issue of GBDT. We assess the performance of UnbiasedGBM and unbiased gain in a large-scale empirical study comprising 60 datasets and show that: 1) UnbiasedGBM exhibits better performance than popular GBDT implementations such as LightGBM, XGBoost, and Catboost on average on the 60 datasets and 2) unbiased gain achieves better average performance in feature selection than popular feature importance methods.
Machine Learning -> ML: Classification
List of keywords 
Machine Learning -> ML: Applications Machine Learning -> ML: Classification
3636
Semi-supervised Domain Adaptation in Graph Transfer Learning
[+] More 
[-] Less 
As a specific case of graph transfer learning, unsupervised domain adaptation on graphs aims for knowledge transfer from label-rich source graphs to unlabeled target graphs. However, graphs with topology and attributes usually have considerable cross-domain disparity and there are numerous real-world scenarios where merely a subset of nodes are labeled in the source graph. This imposes critical challenges on graph transfer learning due to serious domain shifts and label scarcity. To address these challenges, we propose a method named Semi-supervised Graph Domain Adaptation (SGDA). To deal with the domain shift, we add adaptive shift parameters to each of the source nodes, which are trained in an adversarial manner to align the cross-domain distributions of node embedding. Thus, the node classifier trained on labeled source nodes can be transferred to the target nodes. Moreover, to address the label scarcity, we propose pseudo-labeling on unlabeled nodes, which improves classification on the target graph via measuring the posterior influence of nodes based on their relative position to the class centroids. Finally, extensive experiments on a range of publicly accessible datasets validate the effectiveness of our proposed SGDA in different experimental settings.
Machine Learning -> ML: Multi-task and transfer learning
Machine Learning -> ML: Semi-supervised learning
List of keywords 
Data Mining -> DM: Mining graphs Machine Learning -> ML: Multi-task and transfer learning
Machine Learning -> ML: Semi-supervised learning
3639
Don’t Ignore Alienation and Marginalization: Correlating Fraud Detection
[+] More 
[-] Less 
The anonymity of online networks makes tackling fraud increasingly costly. Thanks to the superiority of graph representation learning, graph-based fraud detection has made significant progress in recent years. However, upgrading fraudulent strategies produces more advanced and difficult scams. One common strategy is synergistic camouflage —— combining multiple means to deceive others. Existing methods mostly investigate the differences between relations on individual frauds, that neglect the correlation among multi-relation fraudulent behaviors. In this paper, we design several statistics to validate the existence of synergistic camouflage of fraudsters by exploring the correlation among multi-relation interactions. From the perspective of multi-relation, we find two distinctive features of fraudulent behaviors, i.e., alienation and marginalization. Based on the finding, we propose COFRAUD, a correlation-aware fraud detection model, which innovatively incorporates synergistic camouflage into fraud detection. It captures the correlation among multi-relation fraudulent behaviors. Experimental results on two public datasets demonstrate that COFRAUD achieves significant improvements over state-of-the-art methods.
Data Mining -> DM: Applications
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Security and privacy Data Mining -> DM: Applications
3654
Dual Prompt Learning for Continual Rain Removal from Single Images
[+] More 
[-] Less 
Recent efforts have achieved remarkable progress on single image deraining on the stationary distributed data. However, catastrophic forgetting raises practical concerns when applying these methods to real applications, where the data distributions change constantly. In this paper, we investigate the continual learning issue for rain removal and develop a novel efficient continual learned deraining transformer. Different from the typical replay or regularization-based methods that increase overall training time or parameter space, our method relies on compact prompts which are learnable parameters, to maintain both task-invariant and task-specific knowledge. Our prompts are applied at both image and feature levels to leverage effectively transferred knowledge of images and features among different tasks. We conduct comprehensive experiments under widely-used rain removal datasets, where our proposed dual prompt learning consistently outperforms prior state-of-the-art methods. Moreover, we observe that, even though our method is designed for continual learning, it still achieves superior results on the stationary distributed data, which further demonstrates the effectiveness of our method. Our website is available at: http://liuminghao.com.cn/DPL/.
List of keywords 
Computer Vision -> CV: Computational photography 3655
Open-world Semi-supervised Novel Class Discovery
[+] More 
[-] Less 
Traditional semi-supervised learning tasks assume that both labeled and unlabeled data follow the same class distribution, but the realistic open-world scenarios are of more complexity with unknown novel classes mixed in the unlabeled set. Therefore, it is of great challenge to not only recognize samples from known classes but also discover the unknown number of novel classes within the unlabeled data. In this paper, we introduce a new open-world semi-supervised novel class discovery approach named OpenNCD, a progressive bi-level contrastive learning method over multiple prototypes. The proposed method is composed of two reciprocally enhanced parts. First, a bi-level contrastive learning method is introduced, which maintains the pair-wise similarity of the prototypes and the prototype group levels for better representation learning. Then, a reliable prototype similarity metric is proposed based on the common representing instances. Prototypes with high similarities will be grouped progressively for known class recognition and novel class discovery. Extensive experiments on three image datasets are conducted and the results show the effectiveness of the proposed method in open-world scenarios, especially with scarce known classes and labels.
Machine Learning -> ML: Self-supervised Learning
List of keywords 
Machine Learning -> ML: Semi-supervised learning Machine Learning -> ML: Self-supervised Learning
3663
Diverse Approximations for Monotone Submodular Maximization Problems with a Matroid Constraint
[+] More 
[-] Less 
Finding diverse solutions to optimization problems has been of practical interest for several decades, and recently enjoyed increasing attention in research. While submodular optimization has been rigorously studied in many fields, its diverse solutions extension has not. In this study, we consider the most basic variants of submodular optimization, and propose two simple greedy algorithms, which are known to be effective at maximizing monotone submodular functions. These are equipped with parameters that control the trade-off between objective and diversity. Our theoretical contribution shows their approximation guarantees in both objective value and diversity, as functions of their respective parameters. Our experimental investigation with maximum vertex coverage instances demonstrates their empirical differences in terms of objective-diversity trade-offs.
Search -> S: Heuristic search
List of keywords 
Search -> S: Combinatorial search and optimisation Search -> S: Heuristic search
3667
ActUp: Analyzing and Consolidating tSNE and UMAP
[+] More 
[-] Less 
TSNE and UMAP are popular dimensionality reduction algorithms due to their speed and interpretable low-dimensional embeddings. Despite their popularity, however, little work has been done to study their full span of differences. We theoretically and experimentally evaluate the space of parameters in the TSNE and UMAP algorithms and observe that a single one — the normalization — is responsible for switching between them. This, in turn, implies that a majority of the algorithmic differences can be toggled without affecting the embeddings. We discuss the implications this has on several theoretic claims behind UMAP, as well as how to reconcile them with existing TSNE interpretations.
Based on our analysis, we provide a method (GDR) that combines previously incompatible techniques from TSNE and UMAP and can replicate the results of either algorithm. This allows our method to incorporate further improvements, such as an acceleration that obtains either method’s outputs faster than UMAP. We release improved versions of TSNE, UMAP, and GDR that are fully plug-and-play with the traditional libraries.
Data Mining -> DM: Data visualization
Machine Learning -> ML: Unsupervised learning
List of keywords 
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction Data Mining -> DM: Data visualization
Machine Learning -> ML: Unsupervised learning
3671
An Effective and Efficient Time-aware Entity Alignment Framework via Two-aspect Three-view Label Propagation
[+] More 
[-] Less 
Entity alignment (EA) aims to find the equivalent entity pairs between different knowledge graphs (KGs), which is crucial to promote knowledge fusion. With the wide use of temporal knowledge graphs (TKGs), time-aware EA (TEA) methods appear to enhance EA. Existing TEA models are based on Graph Neural Networks (GNN) and achieve state-of-the-art (SOTA) performance, but it is difficult to transfer them to large-scale TKGs due to the scalability issue of GNN. In this paper, we propose an effective and efficient non-neural EA framework between TKGs, namely LightTEA, which consists of four essential components: (1) Two-aspect Three-view Label Propagation, (2) Sparse Similarity with Temporal Constraints, (3) Sinkhorn Operator, and (4) Temporal Iterative Learning. All of these modules work together to improve the performance of EA while reducing the time consumption of the model. Extensive experiments on public datasets indicate that our proposed model significantly outperforms the SOTA methods for EA between TKGs, and the time consumed by LightTEA is only dozens of seconds at most, no more than 10% of the most efficient TEA method.
List of keywords 
Natural Language Processing -> NLP: Information retrieval and text mining 3675
CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo
[+] More 
[-] Less 
The core of Multi-view Stereo(MVS) is the matching process among reference and source pixels. Cost aggregation plays a significant role in this process, while previous methods focus on handling it via CNNs. This may inherit the natural limitation of CNNs that fail to discriminate repetitive or incorrect matches due to limited local receptive fields. To handle the issue, we aim to involve Transformer into cost aggregation. However, another problem may occur due to the quadratically growing computational complexity caused by Transformer, resulting in memory overflow and inference latency. In this paper, we overcome these limits with an efficient Transformer-based cost aggregation network, namely CostFormer. The Residual Depth-Aware Cost Transformer(RDACT) is proposed to aggregate long-range features on cost volume via self-attention mechanisms along the depth and spatial dimensions. Furthermore, Residual Regression Transformer(RRT) is proposed to enhance spatial attention. The proposed method is a universal plug-in to improve learning-based MVS methods.
List of keywords 
Computer Vision -> CV: 3D computer vision 3686
Incremental and Decremental Optimal Margin Distribution Learning
[+] More 
[-] Less 
Incremental and decremental learning (IDL) deals with the tasks where new data arrives sequentially as a stream or old data turns unavailable continually due to the privacy protection. Existing IDL methods mainly focus on support vector machine and its variants with linear-type loss. There are few studies about the quadratic-type loss, whose Lagrange multipliers are unbounded and much more difficult to track. In this paper, we take the latest statistical learning framework optimal margin distribution machine (ODM) which involves a quadratic-type loss due to the optimization of margin variance, for example, and equip it with the ability to handle IDL tasks. Our proposed ID-ODM can avoid updating the Lagrange multipliers in an infinite range by determining their optimal values beforehand so as to enjoy much more efficiency. Moreover, ID-ODM is also applicable when multiple instances come and leave simultaneously. Extensive empirical studies show that ID-ODM can achieve 9.1x speedup on average with almost no generalization lost compared to retraining ODM on new data set from scratch.
Machine Learning -> ML: Incremental learning
List of keywords 
Machine Learning -> ML: Classification Machine Learning -> ML: Incremental learning
3697
Neural Capacitated Clustering
[+] More 
[-] Less 
Recent work on deep clustering has found new promising methods also for constrained clustering problems. 
Their typically pairwise constraints often can be used to guide the partitioning of the data.
Many problems however, feature cluster-level constraints, e.g. the Capacitated Clustering Problem (CCP), where each point has a weight and the total weight sum of all points in each cluster is bounded by a prescribed capacity. 
In this paper we propose a new method for the CCP, Neural Capacited Clustering, that learns a neural network to predict the assignment probabilities of points to cluster centers from a data set of optimal or near optimal past solutions of other problem instances. 
During inference, the resulting scores are then used in an iterative k-means like procedure to refine the assignment under capacity constraints. 
In our experiments on artificial data and two real world datasets our approach outperforms several state-of-the-art mathematical and heuristic solvers from the literature. 
Moreover, we apply our method in the context of a cluster-first-route-second approach to the Capacitated Vehicle Routing Problem (CVRP) and show competitive results on the well-known Uchoa benchmark.
Constraint Satisfaction and Optimization -> CSO: Applications
Machine Learning -> ML: Geometric learning
List of keywords 
Machine Learning -> ML: Clustering Constraint Satisfaction and Optimization -> CSO: Applications
Machine Learning -> ML: Geometric learning
3704
Flaws of Termination and Optimality in ADOPT-based Algorithms
[+] More 
[-] Less 
A distributed constraint optimization problem (DCOP) is a framework to model multi-agent coordination problems. Asynchronous distributed optimization (ADOPT) is a well-known complete DCOP algorithm, and owing to its superior characteristics, many variants have been proposed over the last decade. It is considered proven that ADOPT-based algorithms have the key properties of termination and optimality, which guarantee that the algorithms terminate in a finite time and obtain an optimal solution, respectively. In this paper, we present counterexamples to the termination and optimality of ADOPT-based algorithms. The flaws are classified into three types, at least one of which exists in each of ADOPT and seven of its variants that we analyzed. In other words, the algorithms may potentially not terminate or terminate with a suboptimal solution. We also propose an amended version of ADOPT that avoids the flaws in existing algorithms and prove that it has the properties of termination and optimality.
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Distributed constraints Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Constraint Satisfaction and Optimization -> CSO: Constraint optimization
3705
On the Role of Memory in Robust Opinion Dynamics
[+] More 
[-] Less 
We investigate opinion dynamics in a fully-connected system, consisting of n agents, where one of the opinions, called correct, represents a piece of information to disseminate. 
One source agent initially holds the correct opinion and remains with this opinion throughout the execution. The goal of the remaining agents is to quickly agree on this correct opinion. At each round, one agent chosen uniformly at random is activated: unless it is the source, the agent pulls the opinions of l random agents and then updates its opinion according to some rule. 
We consider a restricted setting, in which agents have no memory and they only revise their opinions on the basis of those of the agents they currently sample. 
This setting encompasses very popular opinion dynamics, such as the voter model and best-of-k majority rules. 
Qualitatively speaking, we show that lack of memory prevents efficient  convergence. Specifically, we prove that any dynamics requires Omega(n^2) expected time, even under a strong version of the model in which activated agents have complete access to the current configuration of the entire system, i.e., the case l=n. Conversely, we prove that the simple voter model (in which l=1) correctly solves the problem, while almost matching the aforementioned lower bound. 
These results suggest that, in contrast to symmetric consensus problems (that do not involve a notion of correct opinion), fast convergence on the correct opinion using stochastic opinion dynamics may require the use of memory.
Agent-based and Multi-agent Systems -> MAS: Agent communication
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Agent theories and models Agent-based and Multi-agent Systems -> MAS: Agent communication
3709
Sub-Band Based Attention for Robust Polyp Segmentation
[+] More 
[-] Less 
This article proposes a novel spectral domain based solution to the challenging polyp segmentation. The main contribution is based on an interesting finding of the significant existence of the middle frequency sub-band during the CNN process. Consequently, a Sub-Band based Attention (SBA) module is proposed, which uniformly adopts either the high or middle sub-bands of the encoder features to boost the decoder features and thus concretely improve the feature discrimination. A strong encoder supplying informative sub-bands is also very important, while we highly value the local-and-global information enriched CNN features. Therefore, a Transformer Attended Convolution (TAC) module as the main encoder block is introduced. It takes the Transformer features to boost the CNN features with stronger long-range object contexts. The combination of SBA and TAC leads to a novel polyp segmentation framework, SBA-Net. It adopts TAC to effectively obtain encoded features which also input to SBA, so that efficient sub-bands based attention maps can be generated for progressively decoding the bottleneck features. Consequently, SBA-Net can achieve the robust polyp segmentation, as the experimental results demonstrate.
Data Mining -> DM: Frequent pattern mining
Multidisciplinary Topics and Applications -> MDA: Health and medicine
Machine Learning -> ML: Applications
List of keywords 
Computer Vision -> CV: Biomedical image analysis Data Mining -> DM: Frequent pattern mining
Multidisciplinary Topics and Applications -> MDA: Health and medicine
Machine Learning -> ML: Applications
3724
Learning in Multi-Memory Games Triggers Complex Dynamics Diverging from Nash Equilibrium
[+] More 
[-] Less 
Repeated games consider a situation where multiple agents are motivated by their independent rewards throughout learning. In general, the dynamics of their learning become complex. Especially when their rewards compete with each other like zero-sum games, the dynamics often do not converge to their optimum, i.e., the Nash equilibrium. To tackle such complexity, many studies have understood various learning algorithms as dynamical systems and discovered qualitative insights among the algorithms. However, such studies have yet to handle multi-memory games (where agents can memorize actions they played in the past and choose their actions based on their memories), even though memorization plays a pivotal role in artificial intelligence and interpersonal relationship. This study extends two major learning algorithms in games, i.e., replicator dynamics and gradient ascent, into multi-memory games. Then, we prove their dynamics are identical. Furthermore, theoretically and experimentally, we clarify that the learning dynamics diverge from the Nash equilibrium in multi-memory zero-sum games and reach heteroclinic cycles (sojourn longer around the boundary of the strategy space), providing a fundamental advance in learning in games.
Agent-based and Multi-agent Systems -> MAS: Agent theories and models
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning Agent-based and Multi-agent Systems -> MAS: Agent theories and models
3761
Tractable Diversity: Scalable Multiperspective Ontology Management via Standpoint EL
[+] More 
[-] Less 
The tractability of the lightweight description logic EL has allowed for the construction of large and widely used ontologies that support semantic interoperability. However, comprehensive domains with a broad user base are often at odds with strong axiomatisations otherwise useful for inferencing, since these are usually context dependent and subject to diverging perspectives.
In this paper we introduce Standpoint EL, a multi-modal extension of EL that allows for the integrated representation of domain knowledge relative to diverse, possibly conflicting standpoints (or contexts), which can be hierarchically organised and put in relation to each other. We establish that Standpoint EL still exhibits EL’s favourable PTime standard reasoning, whereas introducing additional features like empty standpoints, rigid roles, and nominals makes standard reasoning tasks intractable.
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Knowledge Representation and Reasoning -> KRR: Reasoning about knowledge and belief
List of keywords 
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Knowledge Representation and Reasoning -> KRR: Reasoning about knowledge and belief
3814
Multi-Scale Subgraph Contrastive Learning
[+] More 
[-] Less 
Graph-level contrastive learning, aiming to learn the representations for each graph by contrasting two augmented graphs, has attracted considerable attention. Previous studies usually simply assume that a graph and its augmented graph as a positive pair, otherwise as a negative pair. However, it is well known that graph structure is always complex and multi-scale, which gives rise to a fundamental question: after graph augmentation, will the previous assumption still hold in reality? By an experimental analysis, we discover the semantic information of an augmented graph structure may be not consistent as original graph structure, and whether two augmented graphs are positive or negative pairs is highly related with the multi-scale structures. Based on this finding, we propose a multi-scale subgraph contrastive learning architecture which is able to characterize the fine-grained semantic information. Specifically, we generate global and local views at different scales based on subgraph sampling, and construct multiple contrastive relationships according to their semantic associations to provide richer self-supervised signals. Extensive experiments and parametric analyzes on eight graph classification real-world datasets well demonstrate the effectiveness of the proposed method.
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Self-supervised Learning
List of keywords 
Data Mining -> DM: Mining graphs Machine Learning -> ML: Representation learning
Machine Learning -> ML: Self-supervised Learning
3822
SQuAD-SRC: A Dataset for Multi-Accent Spoken Reading Comprehension
[+] More 
[-] Less 
Spoken Reading Comprehension (SRC) is a challenging problem in spoken natural language retrieval, which automatically extracts the answer from the text-form contents according to the audio-form question. However, the existing spoken question answering approaches are mainly based on synthetically generated audio-form data, which may be ineffectively applied for multi-accent spoken question answering directly in many real-world applications. In this paper, we construct a large-scale multi-accent human spoken dataset SQuAD-SRC, in order to study the problem of multi-accent spoken reading comprehension. We choose 24 native English speakers from six different countries with various English accents and construct audio-form questions to the correspondent text-form contents by the chosen speakers. The dataset consists of 98,169 spoken question answering pairs and 20,963 passages from the popular machine reading comprehension dataset SQuAD. We present a statistical analysis of our SQuAD-SRC dataset and conduct extensive experiments on it by comparing cascaded SRC approaches and the enhanced end-to-end ones. Moreover, we explore various adaption strategies to improve the SRC performance, especially for multi-accent spoken questions.
Natural Language Processing -> NLP: Speech
List of keywords 
Natural Language Processing -> NLP: Question answering Natural Language Processing -> NLP: Speech
3832
Learning When to Use Automatic Tabulation in Constraint Model Reformulation
[+] More 
[-] Less 
Combinatorial optimisation has numerous practical applications, such as planning, logistics, or circuit design. Problems such as these can be solved by approaches such as Boolean Satisfiability (SAT) or Constraint Programming (CP). Solver performance is affected significantly by the model chosen to represent a given problem, which has led to the study of model reformulation. One such method is tabulation: rewriting the expression of some of the model constraints in terms of a single “table” constraint. Successfully applying this process means identifying expressions amenable to trans- formation, which has typically been done manually. Recent work introduced an automatic tabulation using a set of hand-designed heuristics to identify constraints to tabulate. However, the performance of these heuristics varies across problem classes and solvers. Recent work has shown learning techniques to be increasingly useful in the context of automatic model reformulation. The goal of this study is to understand whether it is possible to improve the performance of such heuristics, by learning a model to predict whether or not to activate them for a given instance. Experimental results suggest that a random forest classifier is the most robust choice, improving the performance of four different SAT and CP solvers.
Constraint Satisfaction and Optimization -> CSO: Solvers and tools
Machine Learning -> ML: Classification
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Modeling Constraint Satisfaction and Optimization -> CSO: Solvers and tools
Machine Learning -> ML: Classification
3834
Sorting and Hypergraph Orientation under Uncertainty with  Predictions
[+] More 
[-] Less 
Learning-augmented algorithms have been attracting increasing interest, but have only recently been considered in the setting of explorable uncertainty where precise values of uncertain input elements can be obtained by a query and the goal is to minimize the number of queries needed to solve a problem. We study learning-augmented algorithms for sorting and hypergraph orientation under uncertainty, assuming access to untrusted predictions for the uncertain values. Our algorithms provide improved performance guarantees for accurate predictions while maintaining worst-case guarantees that are best possible without predictions. For sorting, our algorithm uses the optimal number of queries for accurate predictions and at most twice the optimal number for arbitrarily wrong predictions. For hypergraph orientation, for any γ≥2, we give an algorithm that uses at most 1+1/γ times the optimal number of queries for accurate predictions and at most γ times the optimal number for arbitrarily wrong predictions. These tradeoffs are the best possible. We also consider different error metrics and show that the performance of our algorithms degrades smoothly with the prediction error in all the cases where this is possible.
Planning and Scheduling -> PS: Planning under uncertainty
List of keywords 
Search -> S: Combinatorial search and optimisation Planning and Scheduling -> PS: Planning under uncertainty
3835
Regularisation for Efficient Softmax Parameter Generation in Low-Resource Text Classifiers
[+] More 
[-] Less 
Meta-learning has made tremendous progress in recent years and was demonstrated to be particularly suitable in low-resource settings where training data is very limited. However, meta-learning models still require large amounts of training tasks to achieve good generalisation. Since labelled training data may be sparse, self-supervision-based approaches are able to further improve performance on downstream tasks. Although no labelled data is necessary for this training, a large corpus of unlabelled text needs to be available.  
In this paper, we improve on recent advances in meta-learning for natural language models that allow training on a diverse set of training tasks for few-shot, low-resource target tasks. We introduce a way to generate new training data with the need for neither more supervised nor unsupervised datasets. We evaluate the method on a diverse set of NLP tasks and show that the model decreases in performance when trained on this data without further adjustments. Therefore, we introduce and evaluate two methods for regularising the training process and show that they not only improve performance when used in conjunction with the new training data but also improve average performance when training only on the original data, compared to the baseline.
Machine Learning -> ML: Few-shot learning
Machine Learning -> ML: Meta-learning
List of keywords 
Natural Language Processing -> NLP: Text classification Machine Learning -> ML: Few-shot learning
Machine Learning -> ML: Meta-learning
3848
Distributional Multi-Objective Decision Making
[+] More 
[-] Less 
For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly. Based on this criterion, we present the distributional undominated set and show that it contains optimal policies otherwise ignored by the Pareto front. In addition, we propose the convex distributional undominated set and prove that it comprises all policies that maximise expected utility for multivariate risk-averse decision makers. We propose a novel algorithm to learn the distributional undominated set and further contribute pruning operators to reduce the set to the convex distributional undominated set. Through experiments, we demonstrate the feasibility and effectiveness of these methods, making this a valuable new approach for decision support in real-world problems.
Machine Learning -> ML: Reinforcement learning
Uncertainty in AI -> UAI: Other
List of keywords 
Uncertainty in AI -> UAI: Sequential decision making Machine Learning -> ML: Reinforcement learning
Uncertainty in AI -> UAI: Other
3863
Less Learn Shortcut: Analyzing and Mitigating Learning of Spurious Feature-Label Correlation
[+] More 
[-] Less 
Recent research has revealed that deep neural networks often take dataset biases as a shortcut to make decisions rather than understand tasks, leading to failures in real-world applications. In this study, we focus on the spurious correlation between word features and labels that models learn from the biased data distribution of training data. In particular, we define the word highly co-occurring with a specific label as biased word, and the example containing biased word as biased example. Our analysis shows that biased examples are easier for models to learn, while at the time of prediction, biased words make a significantly higher contribution to the models’ predictions, and models tend to assign predicted labels over-relying on the spurious correlation between words and labels. To mitigate models’ over-reliance on the shortcut (i.e. spurious correlation), we propose a training strategy Less-Learn-Shortcut (LLS): our strategy quantifies the biased degree of the biased examples and down-weights them accordingly. Experimental results on Question Matching, Natural Language Inference and Sentiment Analysis tasks show that LLS is a task-agnostic strategy and can improve the model performance on adversarial data while maintaining good performance on in-domain data.
Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
List of keywords 
Natural Language Processing -> NLP: Question answering Natural Language Processing -> NLP: Interpretability and analysis of models for NLP
3873
Principal-Agent Boolean Games
[+] More 
[-] Less 
We introduce and study a computational version of the principal-agent problem — a classic problem in Economics that arises when a principal desires to contract an agent to carry out some task, but has incomplete information about the agent or their subsequent actions. The key challenge in this setting is for the principal to design a contract for the agent such that the agent’s preferences are then aligned with those of the principal. We study this problem using a variation of Boolean games, where multiple players each choose valuations for Boolean variables under their control, seeking the satisfaction of a personal goal formula. In our setting, the principal can only observe some subset of these variables, and the principal chooses a contract which rewards players on the basis of the assignments they make for the variables that are observable to the principal. The principal’s challenge is to design a contract so that, firstly, the principal’s goal is achieved in some or all Nash equilibrium choices, and secondly, that the principal is able to verify that their goal is satisfied. In this paper, we formally define this problem and completely characterise the computational complexity of the most relevant decision problems associated with it.
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Agent theories and models Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis
3895
Towards Robust Gan-Generated Image Detection: A Multi-View Completion Representation
[+] More 
[-] Less 
GAN-generated image detection now becomes the first line of defense against the malicious uses of machine-synthesized image manipulations such as deepfakes. Although some existing detectors work well in detecting clean, known GAN samples, their success is largely attributable to overfitting unstable features such as frequency artifacts, which will cause failures when facing unknown GANs or perturbation attacks. To overcome the issue, we propose a robust detection framework based on a novel multi-view image completion representation. The framework first learns various view-to-image tasks to model the diverse distributions of genuine images. Frequency-irrelevant features can be represented from the distributional discrepancies characterized by the completion models, which are stable, generalized, and robust for detecting unknown fake patterns. Then, a multi-view classification is devised with elaborated intra- and inter-view learning strategies to enhance view-specific feature representation and cross-view feature aggregation, respectively. We evaluated the generalization ability of our framework across six popular GANs at different resolutions and its robustness against a broad range of perturbation attacks. The results confirm our method’s improved effectiveness, generalization, and robustness over various baselines.
Computer Vision -> CV: Neural generative models, auto encoders, GANs
Multidisciplinary Topics and Applications -> MDA: Security and privacy
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Safety and robustness Computer Vision -> CV: Neural generative models, auto encoders, GANs
Multidisciplinary Topics and Applications -> MDA: Security and privacy
3930
Matchings under One-Sided Preferences with Soft Quotas
[+] More 
[-] Less 
Assigning applicants to posts in the presence of the preferences of applicants and quotas associated with posts is extensively investigated. For a post, lower quota guarantees, and upper quota limits the number of applicants assigned to it. Typically, quotas are assumed to be fixed, which need not be the case in practice. We address this by introducing a soft quota setting, in which every post is associated with two values – lower target and upper target which together denote a range for the intended number of applicants in any assignment. Unlike the fixed quota setting, we allow the number of applicants assigned to a post to fall outside the range.  This leads to assignments with deviation. Here, we study the problem of computing an assignment that has two orthogonal optimization objectives – minimizing the deviation (maximum or total) w.r.t. soft quotas and ensuring optimality w.r.t. preferences of applicants (rank-maximality or fairness). The order in which these objectives are considered, the different possibilities to optimize deviation combined with the well-studied notions of optimality w.r.t. preferences open up a range of optimization problems of practical importance. We present efficient algorithms based on flow-networks to solve these optimization problems.
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice 3934
Safe Reinforcement Learning via Probabilistic Logic Shields
[+] More 
[-] Less 
Safe Reinforcement learning (Safe RL) aims at learning optimal policies while staying safe. A popular solution to Safe RL is shielding, which uses a logical safety specification to prevent an RL agent from taking unsafe actions. However, traditional shielding techniques are difficult to integrate with continuous, end-to-end deep RL methods. To this end, we introduce Probabilistic Logic Policy Gradient (PLPG). PLPG is a model-based Safe RL technique that uses probabilistic logic programming to model logical safety constraints as differentiable functions. Therefore, PLPG can be seamlessly applied to any policy gradient algorithm while still providing the same convergence guarantees. In our experiments, we show that PLPG learns safer and more rewarding policies compared to other state-of-the-art shielding techniques.
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Reinforcement learning
List of keywords 
Uncertainty in AI -> UAI: Statistical relational AI Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Reinforcement learning
3935
Ranking-based Argumentation Semantics Applied to Logical Argumentation
[+] More 
[-] Less 
In formal argumentation, a distinction can be made between extension-based semantics, where sets of
arguments are either (jointly) accepted or not, and ranking-based semantics, where grades of accept-
ability are assigned to arguments. Another important distinction is that between abstract approaches,
that abstract away from the content of arguments, and structured approaches, that specify a method
of constructing argument graphs on the basis of a knowledge base. While ranking-based semantics
have been extensively applied to abstract argumentation, few work has been done on ranking-based
semantics for structured argumentation. In this paper, we make a systematic investigation into the be-
haviour of ranking-based semantics applied to existing formalisms for structured argumentation. We
show that a wide class of ranking-based semantics gives rise to so-called culpability measures, and
are relatively robust to specific choices in argument construction methods.
List of keywords 
Knowledge Representation and Reasoning -> KRR: Argumentation 3943
Neuro-Symbolic Learning of Answer Set Programs from Raw Data
[+] More 
[-] Less 
One of the ultimate goals of Artificial Intelligence is to assist humans in complex decision making. A promising direction for achieving this goal is Neuro-Symbolic AI, which aims to combine the interpretability of symbolic techniques with the ability of deep learning to learn from raw data. However, most current approaches require manually engineered symbolic knowledge, and where end-to-end training is considered, such approaches are either restricted to learning definite programs, or are restricted to training binary neural networks. In this paper, we introduce Neuro-Symbolic Inductive Learner (NSIL), an approach that trains a general neural network to extract latent concepts from raw data, whilst learning symbolic knowledge that maps latent concepts to target labels. The novelty of our approach is a method for biasing the learning of symbolic knowledge, based on the in-training performance of both neural and symbolic components. We evaluate NSIL on three problem domains of different complexity, including an NP-complete problem. Our results demonstrate that NSIL learns expressive knowledge, solves computationally complex problems, and achieves state-of-the-art performance in terms of accuracy and data efficiency. Code and technical appendix: https://github.com/DanCunnington/NSIL
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Knowledge Representation and Reasoning -> KRR: Logic programming
List of keywords 
Machine Learning -> ML: Neuro-symbolic methods Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Knowledge Representation and Reasoning -> KRR: Logic programming
3948
Automatic Verification for Soundness of Bounded QNP Abstractions for Generalized Planning
[+] More 
[-] Less 
Generalized planning (GP) studies the computation of general solutions for a set of planning problems. Computing general solutions with correctness guarantee has long been a key issue in GP. Abstractions are widely used to solve GP problems. For example, a popular abstraction model for GP is qualitative numeric planning (QNP), which extends classical planning with non-negative real variables that can be increased or decreased by some arbitrary amount. The refinement of correct solutions of sound abstractions are solutions with correctness guarantees for GP problems. More recent literature proposed a uniform abstraction framework for GP and gave model-theoretic definitions of sound and complete abstractions for GP problems. In this paper, based on the previous work, we explore automatic verification of sound abstractions for GP. Firstly, we present a proof-theoretic characterization for sound abstractions. Secondly, based on the characterization, we give a sufficient condition for sound abstractions with deterministic actions.   Then we study how to verify the sufficient condition when the abstraction models are bounded QNPs where integer variables can be incremented or decremented by one. To this end, we develop methods to handle counting and transitive closure, which are often used to define numerical variables. Finally, we implement a sound bounded QNP abstraction verification system and report experimental results on several domains.
List of keywords 
Knowledge Representation and Reasoning -> KRR: Reasoning about actions 3955
Enhancing Network by Reinforcement Learning and Neural Confined Local Search
[+] More 
[-] Less 
It has been found that many real networks, such as power grids and the Internet, are non-robust, i.e., attacking a small set of nodes would cause the paralysis of the entire network. Thus, the Network Enhancement Problem~(NEP), i.e., improving the robustness of a given network by modifying its structure, has attracted increasing attention. Heuristics have been proposed to address NEP. However, a hand-engineered heuristic often has significant performance limitations. A recently proposed model solving NEP by reinforcement learning has shown superior performance than heuristics on in-distribution datasets. However, their model shows considerably inferior out-of-distribution generalization ability when enhancing networks against the degree-based targeted attack. In this paper, we propose a more effective model with stronger generalization ability by incorporating domain knowledge including measurements of local network structures and decision criteria of heuristics. We further design a hierarchical attention model to utilize the network structure directly, where the query range changes from local to global. Finally, we propose neural confined local search~(NCLS) to realize the effective search of a large neighborhood, which exploits a learned model to confine the neighborhood to avoid exhaustive enumeration. We conduct extensive experiments on synthetic and real networks to verify the ability of our models.
Machine Learning -> ML: Deep reinforcement learning
Search -> S: Search and machine learning
List of keywords 
Data Mining -> DM: Mining graphs Machine Learning -> ML: Deep reinforcement learning
Search -> S: Search and machine learning
3959
Orion: Online Backdoor Sample Detection via Evolution Deviance
[+] More 
[-] Less 
Widely-used DNN models are vulnerable to backdoor attacks, where the backdoored model is only triggered by specific inputs but can maintain a high prediction accuracy on benign samples. Existing backdoor input detection strategies rely on the assumption that benign and poisoned samples are separable in the feature representation of the model. However, such an assumption can be broken by advanced feature-hidden backdoor attacks. In this paper, we propose a novel detection framework, dubbed Orion (online backdoor sample detection via evolution deviance). Specifically, we analyze how predictions evolve during a forward pass and find deviations between the shallow and deep outputs of the backdoor inputs. By introducing side nets to track such evolution divergence, Orion eliminates the need for the assumption of latent separability. Additionally, we put forward a scheme to restore the original label of backdoor samples, enabling more robust predictions. Extensive experiments on six attacks, three datasets, and two architectures verify the effectiveness of Orion. It is shown that Orion outperforms state-of-the-art defenses and can identify feature-hidden attacks with an F1-score of 90%, compared to 40% for other detection schemes. Orion can also achieve 80% label recovery accuracy on basic backdoor attacks.
Machine Learning -> ML: Adversarial machine learning
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
List of keywords 
Computer Vision -> CV: Adversarial learning, adversarial attack and defense methods Machine Learning -> ML: Adversarial machine learning
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
3964
Efficient Object Search in Game Maps
[+] More 
[-] Less 
Video games feature a dynamic environment where locations of objects (e.g., characters, equipment, weapons, vehicles etc.) frequently change within the game world. Although searching for relevant nearby objects in such a dynamic setting is a fundamental operation, this problem has received little research attention. In this paper, we propose a simple lightweight index, called Grid Tree, to store objects and their associated textual data. Our index can be efficiently updated with the underlying updates such as object movements, and supports a variety of object search queries, including k nearest neighbors (returning the k closest objects), keyword k nearest neighbors (returning the k closest objects that satisfy query keywords), and several other variants. Our extensive experimental study, conducted on standard game maps benchmarks and real-world keywords, demonstrates that our approach has  up to 2 orders of magnitude faster update times for moving objects compared to state-of-the-art approaches such as navigation mesh and IR-tree. At the same time, query performance of our approach is similar to or better than that of IR-tree and up to two orders of magnitude faster than the other competitor.
Planning and Scheduling -> PS: Scheduling
Robotics -> ROB: Motion and path planning
List of keywords 
Search -> S: Heuristic search Planning and Scheduling -> PS: Scheduling
Robotics -> ROB: Motion and path planning
3978
Participatory Budgeting: Data, Tools and Analysis
[+] More 
[-] Less 
We provide a library of participatory budgeting data (Pabulib) and open source tools (Pabutools and Pabustats) for analysing this data.
    We analyse how the results of participatory budgeting elections would change if a different selection rule was applied. 
    We provide evidence that the outcomes of the Method of Equal Shares would be considerably fairer than those of the Utilitarian Greedy rule that is currently in use. We also show that the division of the projects into districts and/or categories can in many cases be avoided when using proportional rules. We find that this would increase the overall utility of the voters.
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice 4004
Maximin-Aware Allocations of Indivisible Chores with Symmetric and Asymmetric Agents
[+] More 
[-] Less 
The real-world deployment of fair allocation algorithms usually involves a heterogeneous population of users, which makes it challenging for the users to get complete knowledge of the allocation except for their own bundles. Recently, a new fairness notion, maximin-awareness (MMA) was proposed and it guarantees that every agent is not the worst-off one, no matter how the items that are not allocated to this agent are distributed. We adapt and generalize this notion to the case of indivisible chores and when the agents may have arbitrary weights. Due to the inherent difficulty of MMA, we also consider its up to one and up to any relaxations. A string of results on the existence and computation of MMA related fair allocations, and their connections to existing fairness concepts is given.
Game Theory and Economic Paradigms -> GTEP: Computational social choice
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division Game Theory and Economic Paradigms -> GTEP: Computational social choice
4015
Optimal Anytime Coalition Structure Generation Utilizing Compact Solution Space Representation
[+] More 
[-] Less 
Coalition formation is a central approach for multiagent coordination. A crucial part of coalition formation that is extensively studied in AI is coalition structure generation: partitioning agents into coalitions to maximize overall value. 
In this paper, we propose a novel method for coalition structure generation by introducing a compact and efficient representation of coalition structures. Our representation partitions the solution space into smaller, more manageable subspaces that gather structures containing coalitions of specific sizes. Our proposed method combines two new algorithms, one which leverages our compact representation and a branch-and-bound technique to generate optimal coalition structures, and another that utilizes a preprocessing phase to identify the most promising sets of coalitions to evaluate. Additionally, we show how parts of the solution space can be gathered into groups to avoid their redundant evaluation and we investigate the computational gain that is achieved by avoiding that redundant processing. Through this approach, our algorithm is able to prune the solution space more efficiently. Our results show that the proposed algorithm is superior to prior state-of-the-art methods in generating optimal coalition structures under several value distributions.
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation 4025
SSML-QNet: Scale-Separative Metric Learning Quadruplet Network for Multi-modal Image Patch Matching
[+] More 
[-] Less 
Multi-modal image matching is very challenging due to the significant diversities in visual appearance of different modal images. Typically, the existing well-performed methods mainly focus on learning invariant and discriminative features for measuring the relation between multi-modal image pairs. However, these methods often take the features as a whole and largely overlook the fact that different scale features for a same image pair may have different similarity, which may lead to sub-optimal results only. In this work, we propose a Scale-Separative Metric Learning Quadruplet network (SSML-QNet) for multi-modal image patch matching. Specifically, SSML-QNet can extract both relevant and irrelevant features of imaging modality with the proposed quadruplet network architecture. Then, the proposed Scale-Separative Metric Learning module separately encodes the similarity of different scale features with the pyramid structure. And for each scale, cross-modal consistent features are extracted and measured by coordinate and channel-wise attention sequentially. This makes our network robust to appearance divergence caused by different imaging mechanism. Experiments on the benchmark dataset (VIS-NIR, VIS-LWIR, Optical-SAR, and Brown) have verified that the proposed SSML-QNet is able to outperform other state-of-the-art methods. Furthermore, the cross-dataset transferring experiments on these four datasets also have shown that the proposed method has powerful ability of cross-dataset transferring.
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
List of keywords 
Machine Learning -> ML: Classification Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning
4026
Moral Planning Agents with LTL Values
[+] More 
[-] Less 
A moral planning agent (MPA) seeks to compare two plans or compute an optimal plan in an interactive setting with other agents, where relative ideality and optimality of plans are defined with respect to a prioritized value base. We model MPAs whose values are expressed by formulas of linear temporal logic (LTL) and define comparison for both joint plans and individual plans. We introduce different evaluation criteria for individual plans including an optimistic (risk-seeking) criterion, a pessimistic (risk-averse) one, and two criteria based on the use of anticipated responsibility. We provide complexity results for a variety of MPA problems.
Agent-based and Multi-agent Systems -> MAS: Normative systems
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Moral decision making Agent-based and Multi-agent Systems -> MAS: Normative systems
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
4043
Label Specific Multi-Semantics Metric Learning for Multi-Label Classification: Global Consideration Helps
[+] More 
[-] Less 
In multi-label classification, it is critical to capitalize on complicated data structures and semantic relationships. Metric learning serves as an effective strategy to provide a better measurement of distances between examples. Existing works on metric learning for multi-label classification mainly learn one single global metric that characterizes latent semantic similarity between multi-label instances. However, such single-semantics metric exploitation approaches can not capture the intrinsic properties of multi-label data possessed of rich semantics. In this paper, the first attempt towards multi-semantics metric learning for multi-label classification is investigated. Specifically, the proposed LIMIC approach simultaneously learns one global and multiple label-specific local metrics by exploiting label-specific side information. The global metric is learned to capture the commonality across all the labels and label-specific local metrics characterize the individuality of each semantic space. The combination of global metric and label-specific local metrics is utilized to construct latent semantic space for each label, in which similar intra-class instances are pushed closer and inter-class instances are pulled apart. Furthermore, metric-based label correlation regularization is constructed to maintain similarity between correlated label spaces. Extensive experiments on benchmark multi-label data sets validate the superiority of our proposed approach in learning effective distance metrics for multi-label classification.
Machine Learning -> ML: Classification
List of keywords 
Machine Learning -> ML: Multi-label Machine Learning -> ML: Classification
4051
Solving the Identifying Code Set Problem with Grouped Independent Support
[+] More 
[-] Less 
An important problem in network science is finding an optimal placement of sensors in nodes in order to uniquely detect failures in the network. This problem can be modelled as an identifying code set (ICS) problem, introduced by Karpovsky et al. in 1998. The ICS problem aims to find a cover of a set S, such that the elements in the cover define a unique signature for each of the elements of S, and to minimise the cover’s cardinality. In this work, we study a generalised identifying code set (GICS) problem, where a unique signature must be found for each subset of S that has a cardinality of at most k (instead of just each element of S). The concept of an independent support of a Boolean formula was introduced by Chakraborty et al. in 2014 to speed up propositional model counting, by identifying a subset of variables whose truth assignments uniquely define those of the other variables.
In this work, we introduce an extended version of independent support, grouped independent support (GIS), and show how to reduce the GICS problem to the GIS problem. We then propose a new solving method for finding a GICS, based on finding a GIS. We show that the prior state-of-the-art approaches yield integer-linear programming (ILP) models whose sizes grow exponentially with the problem size and k, while our GIS encoding only grows polynomially with the problem size and k. While the ILP approach can solve the GICS problem on networks of at most 494 nodes, the GIS-based method can handle networks of up to 21 363 nodes; a ∼40× improvement. The GIS-based method shows up to a 520× improvement on the ILP-based method in terms of median solving time. For the majority of the instances that can be encoded and solved by both methods, the cardinality of the solution returned by the GIS-based method is less than 10% larger than the cardinality of the solution found by the ILP method.
Constraint Satisfaction and Optimization -> CSO: Modeling
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint programming Constraint Satisfaction and Optimization -> CSO: Modeling
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
4062
Strategic Resource Selection with Homophilic Agents
[+] More 
[-] Less 
The strategic selection of resources  by selfish agents is a classical research direction, with Resource Selection Games and Congestion Games as prominent examples. In these games, agents select available resources and their utility then depends on the number of agents using the same resources. This implies that there is no distinction between the agents, i.e., they are anonymous.
We depart from this very general setting by proposing Resource Selection Games with heterogeneous agents that strive for a joint resource usage with similar agents. So, instead of the number of other users of a given resource, our model considers agents with different types and the decisive feature is the fraction of same-type agents among the users. More precisely, similarly to Schelling Games, there is a tolerance threshold tau in [0,1] which specifies the agents’ desired minimum fraction of same-type agents on a resource. Agents strive to select resources where at least a tau-fraction of those resources’ users have the same type as themselves. For tau=1, our model generalizes hedonic diversity games with single-peaked utilities with a peak at 1. 
For our general model, we consider the existence and quality of equilibria and the complexity of maximizing the social welfare. Additionally, we consider a bounded rationality model, where agents can only estimate the utility of a resource, since they only know the fraction of same-type agents on a given resource, but not the exact numbers. Thus, they cannot know the impact a strategy change would have on a target resource. Interestingly, we show that this type of bounded rationality yields favorable game-theoretic properties and specific equilibria closely approximate equilibria of the full knowledge setting.
Game Theory and Economic Paradigms -> GTEP: Computational social choice
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Noncooperative games Game Theory and Economic Paradigms -> GTEP: Computational social choice
4068
Deliberation as Evidence Disclosure: A Tale of Two Protocol Types
[+] More 
[-] Less 
We study a model inspired by deliberative practice, in which agents selectively disclose evidence about a set of alternatives prior to taking a final decision on them. We are interested in whether such a process, when iterated to termination, results in the objectively best alternatives being selected—thereby lending support to the idea that groups can be wise even when their members communicate with each other. We find that, under certain restrictions on the relative amounts of evidence, together with the actions available to the agents, there exist deliberation protocols in each of the two families we look at (i.e., simultaneous and sequential) that offer desirable guarantees. Simulation results further complement this picture, by showing how the distribution of evidence among the agents influences parameters of interest, such as the outcome of the protocols and the number of rounds until termination.
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice 4071
Generalization through Diversity: Improving Unsupervised Environment Design
[+] More 
[-] Less 
Agent decision making using Reinforcement Learning (RL) heavily relies on either a model or simulator of the environment (e.g., moving in an 8×8 maze with three rooms, playing Chess on an 8×8 board). Due to this dependence, small changes in the environment (e.g., positions of obstacles in the maze, size of the board) can severely affect the effectiveness of the policy learned by the agent. To that end, existing work has proposed training RL agents on an adaptive curriculum of environments (generated automatically) to improve performance on out-of-distribution (OOD) test scenarios. Specifically, existing research has employed the potential for the agent to learn in an environment (captured using Generalized Advantage Estimation, GAE) as the key factor to select the next environment(s) to train the agent. However, such a mechanism can select similar environments (with a high potential to learn) thereby making agent training redundant on all but one of those environments. To that end, we provide a principled approach to adaptively identify diverse environments based on a novel distance measure relevant to environment design. We empirically demonstrate the versatility and effectiveness of our method in comparison to multiple leading approaches for unsupervised environment design on three distinct benchmark problems used in literature.
Machine Learning -> ML: Deep reinforcement learning
Planning and Scheduling -> PS: POMDPs
List of keywords 
Planning and Scheduling -> PS: Search in planning and scheduling Machine Learning -> ML: Deep reinforcement learning
Planning and Scheduling -> PS: POMDPs
4078
Co-Certificate Learning with SAT Modulo Symmetries
[+] More 
[-] Less 
We present a new SAT-based method for generating all graphs up to isomorphism that satisfy a given co-NP property. Our method extends the SAT Modulo Symmetry (SMS) framework with a technique that we call co-certificate learning. If SMS generates a candidate graph that violates the given  co-NP property,
we obtain a certificate for this violation, i.e., `co-certificate’ for the co-NP property. The co-certificate gives rise to a clause that the SAT solver, serving as SMS’s backend, learns as part of its CDCL procedure. We demonstrate that SMS plus co-certificate learning is a powerful method that allows us to improve the best-known lower bound on the size of Kochen-Specker vector systems, a problem that is central to the foundations of quantum mechanics and has been studied for over half a century. Our approach is orders of magnitude faster and scales significantly better than a recently proposed SAT-based method.
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Solvers and tools
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Solvers and tools
4087
Quantitative Reasoning and Structural Complexity for Claim-Centric Argumentation
[+] More 
[-] Less 
Argumentation is a well-established formalism for nonmonotonic reasoning and a vibrant area of research in AI. Claim-augmented argumentation frameworks (CAFs) have been introduced to deploy a conclusion-oriented perspective. CAFs expand argumentation frameworks by an additional step which involves retaining claims for an accepted set of arguments. We introduce a novel concept of a justification status for claims, a quantitative measure of extensions supporting a particular claim. The well-studied problems of credulous and skeptical reasoning can then be seen as simply the two endpoints of the spectrum when considered as a justification level of a claim. Furthermore, we explore the parameterized complexity of various reasoning problems for CAFs, including the quantitative reasoning for claim assertions. We begin by presenting a suitable graph representation that includes arguments and their associated claims. Our analysis includes the parameter treewidth, and we present decomposition-guided reductions between reasoning problems in CAF and the validity problem for QBF.
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
List of keywords 
Knowledge Representation and Reasoning -> KRR: Argumentation Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
4090
Singularformer: Learning to Decompose Self-Attention to Linearize the Complexity of Transformer
[+] More 
[-] Less 
Transformers achieve excellent performance in a variety of domains since they can capture long-distance dependencies through the self-attention mechanism. However, self-attention is computationally costly due to its quadratic complexity and high memory consumption. In this paper, we propose a novel Transformer variant (Singularformer) that uses neural networks to learn the singular value decomposition process of the attention matrix to design a linear-complexity and memory-efficient global self-attention mechanism. Specifically, we decompose the attention matrix into the product of three matrix factors based on singular value decomposition and design neural networks to learn these matrix factors, then the associative law of matrix multiplication is used to linearize the calculation of self-attention. The above procedure allows us to compute self-attention as two-dimensional reduction processes in the first and second token dimensional spaces, followed by a multi-head self-attention computational process on the first dimensional reduced token features. Experimental results on 8 real-world datasets demonstrate that Singularformer performs favorably against the other Transformer variants with lower time and space complexity. Our source code is publicly available at https://github.com/CSUBioGroup/Singularformer.
List of keywords 
Machine Learning -> ML: Attention models 4097
Transferable Curricula through Difficulty Conditioned Generators
[+] More 
[-] Less 
Advancements in reinforcement learning (RL) have demonstrated superhuman performance in complex tasks such as Starcraft, Go, Chess etc. However, knowledge transfer from Artificial “Experts" to humans remain a significant challenge. A promising avenue for such transfer would be the use of curricula. Recent methods in curricula generation focuses on training RL agents efficiently, yet such methods rely on surrogate measures to track student progress, and are not suited for training robots in the real world (or more ambitiously humans). In this paper, we introduce a method named Parameterized Environment Response Model (PERM) that shows promising results in training RL agents in parameterized environments. Inspired by Item Response Theory, PERM seeks to model difficulty of environments and ability of RL agents directly. Given that RL agents and humans are trained more efficiently under the “zone of proximal development", our method generates a curriculum by matching the difficulty of an environment to the current ability of the student.  In addition, PERM can be trained offline and does not employ non-stationary measures of student ability, making it suitable for transfer between students. We demonstrate PERM’s ability to represent the environment parameter space, and training with RL agents with PERM produces a strong performance in deterministic environments. Lastly, we show that our method is transferable between students, without any sacrifice in training quality.
Humans and AI -> HAI: Computer-aided education
Multidisciplinary Topics and Applications -> MDA: Game playing
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Social sciences Humans and AI -> HAI: Computer-aided education
Multidisciplinary Topics and Applications -> MDA: Game playing
4104
Strip Attention for Image Restoration
[+] More 
[-] Less 
As a long-standing task, image restoration aims to recover the latent sharp image from its degraded counterpart. In recent years, owing to the strong ability of self-attention in capturing long-range dependencies, Transformer based methods have achieved promising performance on multifarious image restoration tasks. However, the canonical self-attention leads to quadratic complexity with respect to input size, hindering its further applications in image restoration. In this paper, we propose a Strip Attention Network (SANet) for image restoration to integrate information in a more efficient and effective manner. Specifically, a strip attention unit is proposed to harvest the contextual information for each pixel from its adjacent pixels in the same row or column. By employing this operation in different directions, each location can perceive information from an expanded region. Furthermore, we apply various receptive fields in different feature groups to enhance representation learning. Incorporating these designs into a U-shaped backbone, our SANet performs favorably against state-of-the-art algorithms on several image restoration tasks. The code is available at https://github.com/c-yn/SANet.
Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Representation learning
List of keywords 
Computer Vision -> CV: Other Computer Vision -> CV: Machine learning for vision
Computer Vision -> CV: Representation learning
4118
Multi-Agent Intention Recognition and Progression
[+] More 
[-] Less 
For an agent in a multi-agent environment, it is often beneficial to be able to predict what other agents will do next when deciding how to act. Previous work in multi-agent intention scheduling assumes a priori knowledge of the current goals of other agents. In this paper, we present a new approach to multi-agent intention scheduling in which an agent uses online goal recognition to identify the goals currently being pursued by other agents while acting in pursuit of its own goals. We show how online goal recognition can be incorporated into an MCTS-based intention scheduler, and evaluate our approach in a range of scenarios. The results demonstrate that our approach can rapidly recognise the goals of other agents even when they are pursuing multiple goals concurrently, and has similar performance to agents which know the goals of other agents a priori.
Planning and Scheduling -> PS: Activity and plan recognition
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Agent theories and models Planning and Scheduling -> PS: Activity and plan recognition
4122
Truthful Fair Mechanisms for Allocating Mixed Divisible and Indivisible Goods
[+] More 
[-] Less 
We study the problem of designing truthful and fair mechanisms when allocating a mixture of divisible and indivisible goods. We first show that there does not exist an EFM (envy-free for mixed goods) and truthful mechanism in general. This impossibility result holds even if there is only one indivisible good and one divisible good and there are only two agents. Thus, we focus on some more restricted settings. Under the setting where agents have binary valuations on indivisible goods and identical valuations on a single divisible good (e.g., money), we design an EFM and truthful mechanism. When agents have binary valuations over both divisible and indivisible goods, we first show there exist EFM and truthful mechanisms when there are only two agents or when there is a single divisible good. On the other hand, we show that the mechanism maximizing Nash welfare cannot ensure EFM and truthfulness simultaneously.
Game Theory and Economic Paradigms -> GTEP: Computational social choice
Game Theory and Economic Paradigms -> GTEP: Mechanism design
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division Game Theory and Economic Paradigms -> GTEP: Computational social choice
Game Theory and Economic Paradigms -> GTEP: Mechanism design
4124
Online Harmonizing Gradient Descent for Imbalanced Data Streams One-Pass Classification
[+] More 
[-] Less 
Many real-world streaming data are sequentially collected over time and with skew-distributed classes. In this situation, online learning models may tend to favor samples from majority classes, making the wrong decisions for those from minority classes. Previous methods try to balance the instance number of different classes or assign asymmetric cost values. They usually require data-buffers to store streaming data or pre-defined cost parameters. This study alternatively shows that the imbalance of instances can be implied by the imbalance of gradients. Then, we propose the Online Harmonizing Gradient Descent (OHGD) for one-pass online classification. By harmonizing the gradient magnitude occurred by different classes, the method avoids the bias of the proposed method in favor of the majority class. Specifically, OHGD requires no data-buffer, extra parameters, or prior knowledge. It also handles imbalanced data streams the same way that it would handle balanced data streams, which facilitates its easy implementation. On top of a few common and mild assumptions, the theoretical analysis proves that OHGD enjoys a satisfying sub-linear regret bound. Extensive experimental results demonstrate the high efficiency and effectiveness in handling imbalanced data streams.
Data Mining -> DM: Class imbalance and unequal cost
Machine Learning -> ML: Classification
List of keywords 
Data Mining -> DM: Mining data streams Data Mining -> DM: Class imbalance and unequal cost
Machine Learning -> ML: Classification
4132
Lifelong Multi-view Spectral Clustering
[+] More 
[-] Less 
In recent years, spectral clustering has become a well-known and effective algorithm in machine learning. However, traditional spectral clustering algorithms are designed for single-view data and fixed task setting. This can become a limitation when dealing with new tasks in a sequence, as it requires accessing previously learned tasks. Hence it leads to high storage consumption, especially for multi-view datasets. In this paper, we address this limitation by introducing a lifelong multi-view clustering framework. Our approach uses view-specific knowledge libraries to capture intra-view knowledge across different tasks. Specifically, we propose two types of libraries: an orthogonal basis library that stores cluster centers in consecutive tasks, and a feature embedding library that embeds feature relations shared among correlated tasks. When a new clustering task is coming, the knowledge is iteratively transferred from libraries to encode the new task, and knowledge libraries are updated according to the online update formulation. Meanwhile, basis libraries of different views are further fused into a consensus library with adaptive weights. Experimental results show that our proposed method outperforms other competitive clustering methods on multi-view datasets by a large margin.
Machine Learning -> ML: Multi-view learning
List of keywords 
Machine Learning -> ML: Clustering Machine Learning -> ML: Multi-view learning
4139
Recursive Small-Step Multi-Agent A* for Dec-POMDPs
[+] More 
[-] Less 
We present recursive small-step multi-agent A* (RS-MAA*), an exact algorithm that optimizes the expected reward in decentralized partially observable Markov decision processes (Dec-POMDPs). RS-MAA* builds on multi-agent A* (MAA*), an algorithm that finds policies by exploring a search tree, but tackles two major scalability concerns. First, we employ a modified, small-step variant of the search tree that avoids the double exponential outdegree of the classical formulation. Second, we use a tight and recursive heuristic that we compute on-the-fly, thereby avoiding an expensive precomputation. The resulting algorithm is conceptually simple, yet it shows superior performance on a rich set of standard benchmarks.
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: Planning under uncertainty
List of keywords 
Planning and Scheduling -> PS: Distributed and multi-agent planning Agent-based and Multi-agent Systems -> MAS: Multi-agent planning
Planning and Scheduling -> PS: Planning under uncertainty
4148
ScriptWorld: Text Based Environment for Learning Procedural Knowledge
[+] More 
[-] Less 
Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning based agents. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: a text-based environment for teaching agents about real-world daily chores and hence imparting commonsense knowledge. To the best of our knowledge, it is the first interactive text-based gaming framework that consists of daily real-world human activities designed using scripts dataset. We provide gaming environments for 10 daily activities and perform a detailed analysis of the proposed environment. We develop RL-based baseline models/agents to play the games in ScriptWorld. To understand the role of language models in such environments, we leverage features obtained from pre-trained language models in the RL agents. Our experiments show that prior knowledge obtained from a pre-trained language model helps to solve real-world text-based gaming environments.
Machine Learning -> ML: Deep reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Applications
Machine Learning -> ML: Applications
List of keywords 
Natural Language Processing -> NLP: Applications Machine Learning -> ML: Deep reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Applications
Machine Learning -> ML: Applications
4152
Model Predictive Control with Reach-avoid Analysis
[+] More 
[-] Less 
In this paper we investigate the optimal controller synthesis problem, so that the system under the controller can reach a specified target set while satisfying given constraints. Existing model predictive control (MPC) methods learn from a set of discrete states visited by previous (sub-)optimized trajectories and thus result in computationally expensive mixed-integer nonlinear optimization. In this paper a novel MPC method is proposed based on reach-avoid analysis to solve the controller synthesis problem iteratively. The reach-avoid analysis is concerned with computing a reach-avoid set which is a set of initial states such that the system can reach the target set successfully. It not only provides terminal constraints, which ensure feasibility of MPC, but also expands discrete states in existing methods into a continuous set (i.e., reach-avoid sets) and thus leads to nonlinear optimization which is more computationally tractable online due to the absence of integer variables. Finally, we evaluate the proposed method and make comparisons with state-of-the-art ones based on several examples.
Machine Learning -> ML: Optimization
List of keywords 
Planning and Scheduling -> PS: Learning in planning and scheduling Machine Learning -> ML: Optimization
4163
Treewidth-Aware Complexity for Evaluating Epistemic Logic Programs
[+] More 
[-] Less 
Logic programs are a popular formalism for encoding many problems relevant to knowledge representation and reasoning as well as artificial intelligence. However, for modeling rational behavior it is oftentimes required to represent the concepts of knowledge and possibility. Epistemic logic programs (ELPs) is such an extension that enables both concepts, which correspond to being true in all or some possible worlds or stable models. For these programs, the parameter treewidth has recently regained popularity. We present complexity results for the evaluation of key ELP fragments for treewidth, which are exponentially better than known results for full ELPs. Unfortunately, we prove that obtained runtimes can not be significantly improved, assuming the exponential time hypothesis. Our approach defines treewidth-aware reductions between quantified Boolean formulas and ELPs. We also establish
that the completion of a program, as used in modern solvers, can be turned treewidth-aware, thereby linearly preserving treewidth.
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
List of keywords 
Knowledge Representation and Reasoning -> KRR: Logic programming Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
4168
ALL-E: Aesthetics-guided Low-light Image Enhancement
[+] More 
[-] Less 
Evaluating the performance of low-light image enhancement (LLE) is highly subjective, thus making integrating human preferences into image enhancement a necessity. Existing methods fail to consider this and present a series of potentially valid heuristic criteria for training enhancement models. In this paper, we propose a new paradigm, i.e., aesthetics-guided low-light image enhancement (ALL-E), which introduces aesthetic preferences to LLE and motivates training in a reinforcement learning framework with an aesthetic reward. Each pixel, functioning as an agent, refines itself by recursive actions, i.e., its corresponding adjustment curve is estimated sequentially. Extensive experiments show that integrating aesthetic assessment improves both subjective experience and objective evaluation. Our results on various benchmarks demonstrate the superiority of ALL-E over state-of-the-art methods. Source code: https://dongl-group.github.io/project pages/ALLE.html
AI Ethics, Trust, Fairness -> ETF: Societal impact of AI
Humans and AI -> HAI: Personalization and user modeling
List of keywords 
Computer Vision -> CV: Computational photography AI Ethics, Trust, Fairness -> ETF: Societal impact of AI
Humans and AI -> HAI: Personalization and user modeling
4170
Enhancing Efficient Continual Learning with Dynamic Structure Development of Spiking Neural Networks
[+] More 
[-] Less 
Children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep Neural Networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms during child growth and development, we propose Dynamic Structure Development of Spiking Neural Networks (DSD-SNN) for efficient and adaptive continual learning. When learning a sequence of tasks, the DSD-SNN dynamically assigns and grows new neurons to new tasks and prunes redundant neurons, thereby increasing memory capacity and reducing computational overhead. In addition, the overlapping shared structure helps to quickly leverage all acquired knowledge to new tasks, empowering a single network capable of supporting multiple incremental tasks (without the separate sub-network mask for each task). We validate the effectiveness of the proposed model on multiple class incremental learning and task incremental learning benchmarks. Extensive experiments demonstrated that our model could significantly improve performance, learning speed and memory capacity, and reduce computational overhead. Besides, our DSD-SNN model achieves comparable performance with the DNNs-based methods, and significantly outperforms the state-of-the-art (SOTA) performance for existing SNNs-based continual learning methods.
Machine Learning -> ML: Incremental learning
List of keywords 
Humans and AI -> HAI: Cognitive modeling Machine Learning -> ML: Incremental learning
4172
An Experimental Comparison of Multiwinner Voting Rules on Approval Elections
[+] More 
[-] Less 
In this paper, we experimentally compare major approval based multiwinner voting rules. To this end, we define a measure of similarity between two equal sized committees subject to a given election. Using synthetic elections coming from several distributions, we analyze how similar are the committees provided by prominent voting rules. Our results can be visualized as maps of voting rules, which provide a counterpoint to a purely axiomatic classification of voting rules. The strength of our proposed method is its independence from preimposed classifications (such as the satisfaction of concrete axioms), and that it indeed offers a much finer distinction than the current state of axiomatic analysis.
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice 4184
Towards Sharp Analysis for Distributed Learning with Random Features
[+] More 
[-] Less 
In recent studies, the generalization properties for distributed learning and random features assumed the existence of the target concept over the hypothesis space. However, this strict condition is not applicable to the more common non-attainable case. In this paper, using refined proof techniques, we first extend the optimal rates for distributed learning with random features to the non-attainable case. Then, we reduce the number of required random features via data-dependent generating strategy, and improve the allowed number of partitions with additional unlabeled data. Theoretical analysis shows these techniques remarkably reduce computational cost while preserving the optimal generalization accuracy under standard assumptions. Finally, we conduct several experiments on both simulated and real-world datasets, and the empirical results validate our theoretical findings.
Machine Learning -> ML: Kernel methods
List of keywords 
Machine Learning -> ML: Learning theory Machine Learning -> ML: Kernel methods
4194
Exploring Effective Inter-Encoder Semantic Interaction for Document-Level Relation Extraction
[+] More 
[-] Less 
In document-level relation extraction (RE), the models are required to correctly predict implicit relations in documents via relational reasoning. To this end, many graph-based methods have been proposed  for this task. Despite their success, these methods still suffer from several drawbacks: 1) their interaction between document encoder and graph encoder is usually unidirectional and insufficient; 2) their graph encoders often fail to capture the global context of nodes in document graph. In this paper, we propose a document-level RE model with a Graph-Transformer Network (GTN). The GTN includes two core sublayers: 1) the graph-attention sublayer that simultaneously models global and local contexts of nodes in the document graph; 2) the cross-attention sublayer, enabling GTN to capture the non-entity clue information from the document encoder. Furthermore, we introduce two auxiliary training tasks to enhance the bidirectional semantic interaction between the document encoder and GTN: 1) the graph node reconstruction that can effectively train our cross-attention sublayer to enhance the semantic transition from the document encoder to GTN; 2) the structure-aware adversarial knowledge distillation, by which we can effectively transfer the structural information of GTN to the document encoder. Experimental results on four benchmark datasets prove the effectiveness of our model. Our source code is available at https://github.com/DeepLearnXMU/DocRE-BSI.
Natural Language Processing -> NLP: Information retrieval and text mining
List of keywords 
Natural Language Processing -> NLP: Information extraction Natural Language Processing -> NLP: Information retrieval and text mining
4195
Multi-view Contrastive Learning Hypergraph Neural Network for Drug-Microbe-Disease Association Prediction
[+] More 
[-] Less 
Identifying the potential associations among drugs, microbes and diseases is of great significance in exploring the pathogenesis and improving precision medicine. There are plenty of computational methods for pair-wise association prediction, such as drug-microbe and microbe-disease associations, but few methods focus on the higher-order triple-wise drug-microbe-disease (DMD) associations. Driven by the advancement of hypergraph neural networks (HGNNs), we expect them to fully capture high-order interaction patterns behind the hypergraph formulated by DMD associations and realize sound prediction performance. However, the confirmed DMD associations are insufficient due to the high cost of in vitro screening, which forms a sparse DMD hypergraph and thus brings in suboptimal generalization ability. To mitigate the limitation, we propose a Multi-view Contrastive Learning Hypergraph Neural Network, named MCHNN, for DMD association prediction. We design a novel multi-view contrastive learning on the DMD hypergraph as an auxiliary task, which guides the HGNN to learn more discriminative representations and enhances the generalization ability. Extensive computational experiments show that MCHNN achieves satisfactory performance in DMD association prediction and, more importantly, demonstrate the effectiveness of our devised multi-view contrastive learning on the sparse DMD hypergraph.
Multidisciplinary Topics and Applications -> MDA: Health and medicine
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Bioinformatics Multidisciplinary Topics and Applications -> MDA: Health and medicine
4206
In Which Graph Structures Can We Efficiently Find Temporally Disjoint Paths and Walks?
[+] More 
[-] Less 
A temporal graph has an edge set that may change over discrete time steps, and a temporal path (or walk) must traverse edges that appear at increasing time steps. Accordingly, two temporal paths (or walks) are temporally disjoint if they do not visit any vertex at the same time. The study of the computational complexity of finding temporally disjoint paths or walks in temporal graphs has recently been initiated by Klobas et al.. This problem is motivated by applications in multi-agent path finding (MAPF), which include robotics, warehouse management, aircraft management, and traffic routing.
We extend Klobas et al.’s research by providing parameterized hardness results for very restricted cases, with a focus on structural parameters of the so-called underlying graph. On the positive side, we identify sufficiently simple cases where we can solve the problem efficiently. Our results reveal some surprising differences between the “path version” and the “walk version” (where vertices may be visited multiple times) of the problem, and answer several open questions posed by Klobas et al.
Planning and Scheduling -> PS: Routing
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent planning Planning and Scheduling -> PS: Routing
4215
Semantic-Aware Generation of Multi-View Portrait Drawings
[+] More 
[-] Less 
Neural radiance fields (NeRF) based methods have shown amazing performance in synthesizing 3D-consistent photographic images, but fail to generate multi-view portrait drawings. The key is that the basic assumption of these methods — a surface point is consistent when rendered from different views — doesn’t hold for drawings. In a portrait drawing, the appearance of a facial point may changes when viewed from different angles. Besides, portrait drawings usually present little 3D information and suffer from insufficient training data. To combat this challenge, in this paper, we propose a Semantic-Aware GEnerator (SAGE) for synthesizing multi-view portrait drawings. Our motivation is that facial semantic labels are view-consistent and correlate with drawing techniques. We therefore propose to collaboratively synthesize multi-view semantic maps and the corresponding portrait drawings. To facilitate training, we design a semantic-aware domain translator, which generates portrait drawings based on features of photographic faces. In addition, use data augmentation via synthesis to mitigate collapsed results. We apply SAGE to synthesize multi-view portrait drawings in diverse artistic styles. Experimental results show that SAGE achieves significantly superior or highly competitive performance, compared to existing 3D-aware image synthesis methods. The codes are available at https://github.com/AiArt-HDU/SAGE.
Computer Vision -> CV: Neural generative models, auto encoders, GANs
Multidisciplinary Topics and Applications -> MDA: Arts and creativity
List of keywords 
Computer Vision -> CV: 3D computer vision Computer Vision -> CV: Neural generative models, auto encoders, GANs
Multidisciplinary Topics and Applications -> MDA: Arts and creativity
4226
Depth-Relative Self Attention for Monocular Depth Estimation
[+] More 
[-] Less 
Monocular depth estimation is very challenging because clues to the exact depth are incomplete in a single RGB image. To overcome the limitation, deep neural networks rely on various visual hints such as size, shade, and texture extracted from RGB information. However, we observe that if such hints are overly exploited, the network can be biased on RGB information without considering the comprehensive view. We propose a novel depth estimation model named RElative Depth Transformer (RED-T) that uses relative depth as guidance in self-attention. Specifically, the model assigns high attention weights to pixels of close depth and low attention weights to pixels of distant depth. As a result, the features of similar depth can become more likely to each other and thus less prone to misused visual hints. We show that the proposed model achieves competitive results in monocular depth estimation benchmarks and is less biased to RGB information. In addition, we propose a novel monocular depth estimation benchmark that limits the observable depth range during training in order to evaluate the robustness of the model for unseen depths.
Computer Vision -> CV: Applications
List of keywords 
Computer Vision -> CV: Machine learning for vision Computer Vision -> CV: Applications
4228
Auto-bidding with Budget and ROI Constrained Buyers
[+] More 
[-] Less 
In online advertising markets, an increasing number of advertisers are adopting auto-bidders to buy advertising slots. This tool simplifies the process of optimizing bids based on various financial constraints.
In our study, we focus on second-price auctions where bidders have both private budget and private ROI (return on investment) constraints. We formulate the auto-bidding system design problem as a mathematical program and analyze the auto-bidders’ bidding strategy under such constraints. We demonstrate that our design ensures truthfulness, i.e., among all pure and mixed strategies, always reporting the truthful budget and ROI is an optimal strategy for the bidders. Although the program is non-convex, we provide a fast algorithm to compute the optimal bidding strategy for the bidders based on our analysis. We also study the welfare and provide a lower bound for the PoA (price of anarchy). Moreover, we prove that if all bidders utilize our auto-bidding system, a Bayesian Nash equilibrium exists. We provide a sufficient condition under which the iterated best response process converges to such an equilibrium. Finally, we conduct extensive experiments to empirically evaluate the effectiveness of our design.
Game Theory and Economic Paradigms -> GTEP: Mechanism design
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems Game Theory and Economic Paradigms -> GTEP: Mechanism design
4229
Targeting Minimal Rare Itemsets from Transaction Databases
[+] More 
[-] Less 
The computation of minimal rare itemsets is a well known task in data mining, with numerous applications, e.g., drugs effects analysis and network security, among others. This paper presents a novel approach to the computation of minimal rare itemsets. First, we introduce a generalization of the traditional minimal rare itemset model called k-minimal rare itemset. A k-minimal rare itemset is defined as an itemset that becomes frequent or rare based on the removal of at least k or at most (k − 1) items from it. We claim that our work is the first to propose this generalization in the field of data mining. We then present a SAT-based framework for efficiently discovering k-minimal rare itemsets from large transaction databases. Afterwards, by partitioning the k-minimal rare itemset mining problem into smaller sub-problems, we aim to make it more manageable and easier to solve. Finally, to evaluate the effectiveness and efficiency of our approach, we conduct extensive experimental analysis using various popular datasets. We compare our method with existing specialized algorithms and CP-based algorithms commonly used for this task.
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
List of keywords 
Data Mining -> DM: Frequent pattern mining Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
4236
Denial-of-Service or Fine-Grained Control: Towards Flexible Model Poisoning Attacks on Federated Learning
[+] More 
[-] Less 
Federated learning (FL) is vulnerable to poisoning attacks, where adversaries corrupt the global aggregation results and cause denial-of-service (DoS). Unlike recent model poisoning attacks that optimize the amplitude of malicious perturbations along certain prescribed directions to cause DoS, we propose a flexible model poisoning attack (FMPA) that can achieve versatile attack goals. We consider a practical threat scenario where no extra knowledge about the FL system (e.g., aggregation rules or updates on benign devices) is available to adversaries. FMPA exploits the global historical information to construct an estimator that predicts the next round of the global model as a benign reference. It then fine-tunes the reference model to obtain the desired poisoned model with low accuracy and small perturbations. Besides the goal of causing DoS, FMPA can be naturally extended to launch a fine-grained controllable attack, making it possible to precisely reduce the global accuracy. Armed with precise control, malicious FL service providers can gain advantages over their competitors without getting noticed, hence opening a new attack surface in FL other than DoS. Even for the purpose of DoS, experiments show that FMPA significantly decreases the global accuracy, outperforming six state-of-the-art attacks.
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Machine Learning -> ML: Adversarial machine learning
List of keywords 
Machine Learning -> ML: Federated learning AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Machine Learning -> ML: Adversarial machine learning
4248
Topological Planning with Post-unique and Unary Actions
[+] More 
[-] Less 
We are interested in realistic planning problems to model the behavior of Non-Playable Characters (NPCs) in video games. Search-based action planning, introduced by the game F.E.A.R. in 2005, has an exponential time complexity allowing to control only a dozen NPCs between two frames. A close study of the plans generated in first-person shooters shows that: (1) actions are unary, (2) actions are contextually post-unique and (3) there is no two instances of the same action in an NPC’s plan. By considering (1), (2) and (3) as restrictions, we introduce new classes of problems with the Simplified Action Structure formalism which indeed allow to model realistic problems and whose instances are solvable by a linear-time algorithm. We also experimentally show that our algorithm is capable of managing millions of NPCs per frame.
Planning and Scheduling -> PS: Real-time planning
Multidisciplinary Topics and Applications -> MDA: Computer games
List of keywords 
Planning and Scheduling -> PS: Planning algorithms Planning and Scheduling -> PS: Real-time planning
Multidisciplinary Topics and Applications -> MDA: Computer games
4251
Unveiling Concepts Learned by a World-Class Chess-Playing Agent
[+] More 
[-] Less 
In recent years, the state-of-the-art agents for playing abstract board games, like chess and others, have moved from using intricate hand-crafted models for evaluating the merits of individual game states toward using neural networks (NNs). This development has eased the encapsulation of the relevant domain-specific knowledge and resulted in much-improved playing strength. However, this has come at the cost of making the resulting models ill-interpretable and challenging to understand and use for enhancing human knowledge. Using a world-class superhuman-strength chess-playing engine as our testbed, we show how recent model probing interpretability techniques can shed light on concepts learned by the engine’s NN. Furthermore, to gain additional insight, we contrast the game-state evaluations of the NN to that of its counterpart hand-crafted evaluation model and identify and explain some of the main differences.
Machine Learning -> ML: Explainable/Interpretable machine learning
Search -> S: Game playing
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Game playing Machine Learning -> ML: Explainable/Interpretable machine learning
Search -> S: Game playing
4255
Multi-View Robust Graph Representation Learning for Graph Classification
[+] More 
[-] Less 
The robustness of graph classification models plays an essential role in providing highly reliable applications. Previous studies along this line primarily focus on seeking the stability of the model in terms of  overall data metrics (e.g., accuracy)  when facing data perturbations, such as removing edges. Empirically, we find that these graph classification models also suffer from semantic bias and confidence collapse issues, which substantially hinder their applicability in real-world scenarios. To address these issues, we present  MGRL, a multi-view representation learning model for graph classification tasks that achieves robust results. Firstly, we proposes an instance-view consistency representation learning method, which utilizes multi-granularity contrastive learning technique to perform semantic constraints on instance representations at both the node and graph levels, thus alleviating the semantic bias issue. Secondly, we proposes a class-view discriminative representation learning method, which employs the prototype-driven class distance optimization technique to adjust intra- and inter-class distances, thereby mitigating the confidence collapse issue.Finally, extensive experiments and visualizations on eight benchmark dataset demonstrate the effectiveness of MGRL.
Machine Learning -> ML: Representation learning
List of keywords 
Machine Learning -> ML: Robustness Machine Learning -> ML: Representation learning
4257
Automatic Truss Design with Reinforcement Learning
[+] More 
[-] Less 
Truss layout design, namely finding a lightweight truss layout satisfying all the physical constraints, is a fundamental problem in the building industry. Generating the optimal layout is a challenging combinatorial optimization problem, which can be extremely expensive to solve by exhaustive search. Directly applying end-to-end reinforcement learning (RL) methods to truss layout design is infeasible either, since only a tiny portion of the entire layout space is valid under the physical constraints, leading to particularly sparse rewards for RL training.
In this paper, we develop AutoTruss, a two-stage framework to efficiently generate both lightweight and valid truss layouts. AutoTruss first adopts Monte Carlo tree search to discover a diverse collection of valid layouts. Then RL is applied to iteratively refine the valid solutions. We conduct experiments and ablation studies in popular truss layout design test cases in both 2D and 3D settings.  AutoTruss outperforms the best-reported layouts by 25.1% in the most challenging 3D test cases, resulting in the first effective deep-RL-based approach in the truss layout design literature.
Machine Learning -> ML: Reinforcement learning
Search -> S: Combinatorial search and optimisation
List of keywords 
Machine Learning -> ML: Applications Machine Learning -> ML: Reinforcement learning
Search -> S: Combinatorial search and optimisation
4271
Algorithmics of Egalitarian versus Equitable Sequences of Committees
[+] More 
[-] Less 
We study the election of sequences of committees, where in each of tau levels (e.g. modeling points in time) a committee consisting of k candidates from a common set of m candidates is selected. For each level, each of n agents (voters) may nominate one candidate whose selection would satisfy her. We are interested in committees which are good with respect to the satisfaction per day and per agent. More precisely, we look for egalitarian or equitable committee sequences. While both guarantee that at least x agents per day are satisfied, egalitarian committee sequences ensure that each agent is satisfied in at least y levels while equitable committee sequences ensure that each agent is satisfied in exactly y levels. We analyze the parameterized complexity of finding such committees for the parameters n, m, k, tau, x, and y, as well as combinations thereof.
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice AI Ethics, Trust, Fairness -> ETF: Fairness and diversity
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
4276
Analyzing Intentional Behavior in Autonomous Agents under Uncertainty
[+] More 
[-] Less 
Principled accountability for autonomous decision-making in uncertain environments requires distinguishing intentional outcomes from negligent designs from actual accidents. We propose analyzing the behavior of autonomous agents through a quantitative measure of the evidence of intentional behavior. We model an uncertain environment as a Markov Decision Process (MDP). For a given scenario, we rely on probabilistic model checking to compute the ability of the agent to influence reaching a certain event. We call this the scope of agency. We say that there is evidence of intentional behavior if the scope of agency is high and the decisions of the agent are close to being optimal for reaching the event.  Our method applies counterfactual reasoning to automatically generate relevant scenarios that can be analyzed to increase the confidence of our assessment. In a case study, we show how our method can distinguish between ‘intentional’ and ‘accidental’ traffic collisions.
AI Ethics, Trust, Fairness -> ETF: Moral decision making
Planning and Scheduling -> PS: Markov decisions processes
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Accountability AI Ethics, Trust, Fairness -> ETF: Moral decision making
Planning and Scheduling -> PS: Markov decisions processes
4290
Measuring a Priori Voting Power in Liquid Democracy
[+] More 
[-] Less 
We introduce new power indices to measure the a priori voting power of voters in liquid democracy elections where an underlying network restricts delegations. We argue that our power indices are natural extensions of the standard Penrose-Banzhaf index in simple voting games. 
We show that computing the criticality of a voter is #P-hard even in weighted games with weights polynomially-bounded in the size of the instance. 
However, for specific settings, such as when the underlying network is a bipartite or complete graph, recursive formulas can compute these indices for weighted voting games in pseudo-polynomial time. 
We highlight their theoretical properties and provide numerical results to illustrate how restricting the possible delegations can alter voters’ voting power.
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice 4306
Unifying Core-Guided and Implicit Hitting Set Based Optimization
[+] More 
[-] Less 
Two of the most central algorithmic paradigms implemented in practical solvers for maximum satisfiability (MaxSAT) and other related declarative paradigms for NP-hard combinatorial optimization are the core-guided (CG) and implicit hitting set (IHS) approaches. We develop a general unifying algorithmic framework, based on the recent notion of abstract cores, that captures both CG and IHS computations. The framework offers a unified way of establishing the correctness of variants of the approaches, and can be instantiated in novel ways giving rise to new algorithmic variants of the core-guided and IHS approaches. We illustrate the latter aspect by developing a prototype implementation of an algorithm variant for MaxSAT based on the framework.
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Constraint Satisfaction and Optimization -> CSO: Solvers and tools
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint optimization Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
Constraint Satisfaction and Optimization -> CSO: Solvers and tools
4309
Quantifying Consistency and Information Loss for Causal Abstraction Learning
[+] More 
[-] Less 
Structural causal models provide a formalism to express causal relations between variables of interest. Models and variables can represent a system at different levels of abstraction, whereby relations may be coarsened and refined according to the need of a modeller.
However, switching between different levels of abstraction requires evaluating a trade-off between the consistency and the information loss among different models.
In this paper we introduce a family of interventional measures that an agent may use to evaluate such a trade-off. We consider four measures suited for different tasks, analyze their properties, and propose algorithms to evaluate and learn causal abstractions. Finally, we illustrate the flexibility of our setup by empirically showing how different measures and algorithmic choices may lead to different abstractions.
List of keywords 
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference 4311
Hyperspectral Image Denoising Using Uncertainty-Aware Adjustor
[+] More 
[-] Less 
Hyperspectral image (HSI) denoising has achieved promising results with the development of deep learning. A mainstream class of methods exploits the spatial-spectral correlations and recovers each band with the aids of neighboring bands, collectively referred to as spectral auxiliary networks. However, these methods treat entire adjacent spectral bands equally. In theory, clearer and nearer bands  tend to contain more reliable spectral information than noisier and farther ones with higher uncertainties. How to achieve spectral enhancement and adaptation of each adjacent band has become an urgent problem in HSI denoising. This work presents the UA-Adjustor, a comprehensive adjustor that enhances denoising performance by considering both the band-to-pixel and enhancement-to-adjustment aspects. Specifically, UA-Adjustor consists of three stages that evaluate the importance of neighboring bands, enhance neighboring bands based on uncertainty perception, and adjust the weight of spatial pixels in adjacent bands through estimated uncertainty. For its simplicity, UA-Adjustor can be flexibly plugged into existing spectral auxiliary networks to improve denoising behavior at low cost. Extensive experimental results validate that the proposed solution can improve over recent state-of-the-art (SOTA) methods on both simulated and real-world benchmarks by a large margin.
Computer Vision -> CV: Computational photography
List of keywords 
Computer Vision -> CV: Applications Computer Vision -> CV: Computational photography
4313
A Rule-Based Modal View of Causal Reasoning
[+] More 
[-] Less 
We present a novel rule-based semantics for causal reasoning as well as a number of modal languages interpreted over it. They enable us to represent some fundamental concepts in the theory of causality including causal necessity and possibility, interventionist conditionals and Lewisian conditionals. We provide complexity results for the satisfiability checking and model checking problem for these modal languages. Moreover, we study the relationship between our rule-based semantics and the structural equation modeling (SEM) approach to causal reasoning, as well as between our rule-based semantics for causal conditionals and the standard semantics for belief base change.
Knowledge Representation and Reasoning -> KRR: Causality
Knowledge Representation and Reasoning -> KRR: Reasoning about knowledge and belief
List of keywords 
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages Knowledge Representation and Reasoning -> KRR: Causality
Knowledge Representation and Reasoning -> KRR: Reasoning about knowledge and belief
4322
FEAMOE: Fair, Explainable and Adaptive Mixture of Experts
[+] More 
[-] Less 
Three key properties that are desired of trustworthy machine learning models deployed in high-stakes environments are fairness, explainability, and an ability to account for various kinds of "drift". While drifts in model accuracy have been widely investigated, drifts in fairness metrics over time remain largely unexplored. In this paper, we propose FEAMOE, a novel "mixture-of-experts" inspired framework aimed at learning fairer, more interpretable models that can also rapidly adjust to drifts in both the accuracy and the fairness of a classifier. We illustrate our framework for three popular fairness measures and demonstrate how drift can be handled with respect to these fairness constraints. Experiments on multiple datasets show that our framework as applied to a mixture of linear experts is able to perform comparably to neural networks in terms of accuracy while producing fairer models. We then use the large-scale HMDA dataset and show that various models trained on HMDA demonstrate drift and FEAMOE can ably handle these drifts with respect to all the considered fairness measures and maintain model accuracy. We also prove that the proposed framework allows for producing fast Shapley value explanations, which makes computationally efficient feature attribution based explanations of model decisions readily available via FEAMOE.
AI Ethics, Trust, Fairness -> ETF: Bias
AI Ethics, Trust, Fairness -> ETF: Other
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Ethical, legal and societal issues AI Ethics, Trust, Fairness -> ETF: Bias
AI Ethics, Trust, Fairness -> ETF: Other
4339
GIDnets: Generative Neural Networks for Solving Inverse Design Problems via Latent Space Exploration
[+] More 
[-] Less 
In a number of different fields, including Engeneering, Chemistry and Physics, the design of technological tools and device structures is increasingly supported by deep-learning based methods, which provide suggestions on crucial architectural choices based on the properties that these tools and structures should exhibit. The paper proposes a novel architecture, named GIDnet, to address this inverse design problem, which is based on exploring a suitably defined latent space associated with the possible designs. Among its distinguishing features, GIDnet is capable of identifying the most appropriate starting point for the exploration and of likely converging into a point corresponding to a design that is a feasible one. Results of a thorough experimental activity evidence that GIDnet outperforms earlier approaches in the literature.
Machine Learning -> ML: Applications
Machine Learning -> ML: Autoencoders
List of keywords 
Machine Learning -> ML: Experimental methodology Machine Learning -> ML: Applications
Machine Learning -> ML: Autoencoders
4350
Musical Voice Separation as Link Prediction: Modeling a Musical Perception Task as a Multi-Trajectory Tracking Problem
[+] More 
[-] Less 
This paper targets the perceptual task of separating the different interacting voices, i.e., monophonic melodic streams, in a polyphonic musical piece. We target symbolic music, where notes are explicitly encoded, and model this task as a Multi-Trajectory Tracking (MTT) problem from discrete observations, i.e., notes in a pitch-time space. Our approach builds a graph from a musical piece, by creating one node for every note, and separates the melodic trajectories by predicting a link between two notes if they are consecutive in the same voice/stream. This kind of local, greedy prediction is made possible by node embeddings created by a heterogeneous graph neural network that can capture inter- and intra-trajectory information. Furthermore, we propose a new regularization loss that encourages the output to respect the MTT premise of at most one incoming and one outgoing link for every node, favoring monophonic (voice) trajectories; this loss function might also be useful in other general MTT scenarios. Our approach does not use domain-specific heuristics, is scalable to longer sequences and a higher number of voices, and can handle complex cases such as voice inversions and overlaps. We reach new state-of-the-art results for the voice separation task on classical music of different styles. All code, data, and pretrained models are available on https://github.com/manoskary/vocsep_ijcai2023
Machine Learning -> ML: Applications
Multidisciplinary Topics and Applications -> MDA: Arts and creativity
List of keywords 
Machine Learning -> ML: Sequence and graph learning Machine Learning -> ML: Applications
Multidisciplinary Topics and Applications -> MDA: Arts and creativity
4351
Choose your Data Wisely: A Framework for Semantic Counterfactuals
[+] More 
[-] Less 
Counterfactual explanations have been argued to be one of the most intuitive forms of explanation. They are typically defined as a minimal set of edits on a given data sample that, when applied, changes the output of a model on that sample. However, a minimal set of edits is not always clear and understandable to an end-user, as it could constitute an adversarial example (which is indistinguishable from the original data sample to an end-user). Instead, there are recent ideas that the notion of minimality in the context of counterfactuals should refer to the semantics of the data sample, and not to the feature space. In this work, we build on these ideas, and propose a framework that provides counterfactual explanations in terms of knowledge graphs. We provide an algorithm for computing such explanations (given some assumptions about the underlying knowledge), and quantitatively evaluate the framework with a user study.
Knowledge Representation and Reasoning -> KRR: Applications
Machine Learning -> ML: Explainable/Interpretable machine learning
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability Knowledge Representation and Reasoning -> KRR: Applications
Machine Learning -> ML: Explainable/Interpretable machine learning
4356
Front-to-End Bidirectional Heuristic Search with Consistent Heuristics: Enumerating and Evaluating Algorithms and Bounds
[+] More 
[-] Less 
Recent research on bidirectional heuristic search (BiHS) is based on the must-expand pairs theory  (MEP theory), which describes which pairs of nodes must be expanded during the search to guarantee the optimality of solutions. A separate line of research in BiHS has proposed algorithms that use lower bounds that are derived from consistent heuristics during search. This paper links these two directions, providing a comprehensive unifying view and showing that both existing and novel algorithms can be derived from the MEP theory. An extended set of bounds is formulated, encompassing both previously discovered bounds and new ones. Finally, the bounds are empirically evaluated by their contribution to the efficiency of the search
List of keywords 
Search -> S: Heuristic search 4367
Measuring and Controlling Divisiveness in Rank Aggregation
[+] More 
[-] Less 
In rank aggregation, members of a population rank issues to decide which are collectively preferred.  We focus instead on identifying divisive issues that express disagreements among the preferences of individuals. We analyse the properties of our divisiveness measures and their relation to existing notions of polarisation. We also study their robustness under incomplete preferences and algorithms for control and manipulation of divisiveness.  Our results advance our understanding of how to quantify disagreements in collective decision-making.
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Multidisciplinary Topics and Applications -> MDA: Social sciences
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Multidisciplinary Topics and Applications -> MDA: Social sciences
4371
Curriculum Multi-Level Learning for Imbalanced Live-Stream Recommendation
[+] More 
[-] Less 
In large-scale e-commerce live-stream recommendation, streamers are classified into different levels based on their popularity and other metrics for marketing. Several top streamers at the head level occupy a considerable amount of exposure, resulting in an unbalanced data distribution. A unified model for all levels without consideration of imbalance issue can be biased towards head streamers and neglect the conflicts between levels. The lack of inter-level streamer correlations and intra-level streamer characteristics modeling imposes obstacles to estimating the user behaviors. To tackle these challenges, we propose a curriculum multi-level learning framework for imbalanced recommendation. We separate model parameters into shared and level-specific ones to explore the generality among all levels and discrepancy for each level respectively. The level-aware gradient descent and a curriculum sampling scheduler are designed to capture the de-biased commonalities from all levels as the shared parameters. During the specific parameters training, the hardness-aware learning rate and an adaptor are proposed to dynamically balance the training process. Finally, shared and specific parameters are combined to be the final model weights and learned in a cooperative training framework. Extensive experiments on a live-stream production dataset demonstrate the superiority of the proposed framework.
Machine Learning -> ML: Multi-task and transfer learning
List of keywords 
Data Mining -> DM: Recommender systems Machine Learning -> ML: Multi-task and transfer learning
4372
Keep Skills in Mind: Understanding and Implementing Skills in Commonsense Question Answering
[+] More 
[-] Less 
Commonsense Question Answering (CQA) aims to answer questions that require human commonsense. Closed-book CQA, as one of the subtasks, requires the model to answer questions without retrieving external knowledge, which emphasizes the importance of the model’s problem-solving ability. Most previous methods relied on large-scale pre-trained models to generate question-related knowledge while ignoring the crucial role of skills in the process of answering commonsense questions. Generally, skills refer to the learned ability in performing a specific task or activity, which are derived from knowledge and experience. In this paper, we introduce a new approach named Dynamic Skill-aware Commonsense Question Answering (DSCQA), which transcends the limitations of traditional methods by informing the model about the need for each skill in questions and utilizes skills as a critical driver in CQA process. To be specific, DSCQA first employs commonsense skill extraction module to generate various skill representations. Then, DSCQA utilizes dynamic skill module to generate dynamic skill representations. Finally, in perception and emphasis module, various skills and dynamic skill representations are used to help question-answering process. Experimental results on two publicly available CQA datasets show the effectiveness of our proposed model and the considerable impact of introducing skills.
Knowledge Representation and Reasoning -> KRR: Common-sense reasoning
List of keywords 
Natural Language Processing -> NLP: Question answering Knowledge Representation and Reasoning -> KRR: Common-sense reasoning
4376
Graph-based Semi-supervised Local Clustering with Few Labeled Nodes
[+] More 
[-] Less 
Local clustering aims at extracting a local structure inside a graph without the necessity of knowing the entire graph structure. As the local structure is usually small in size compared to the entire graph, one can think of it as a compressive sensing problem where the indices of target cluster can be thought as a sparse solution to a linear system. In this paper, we apply this idea based on two pioneering works under the same framework and propose a new semi-supervised local clustering approach using only few labeled nodes. Our approach improves the existing works by making the initial cut to be the entire graph and hence overcomes a major limitation of the existing works, which is the low quality of initial cut. Extensive experimental results on various datasets demonstrate the effectiveness of our approach.
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Semi-supervised learning
List of keywords 
Machine Learning -> ML: Clustering Data Mining -> DM: Mining graphs
Machine Learning -> ML: Semi-supervised learning
4381
On the Compilability of Bounded Numeric Planning
[+] More 
[-] Less 
Bounded numeric planning, where each numeric variable domain is bounded, is PSPACE-complete, but such a complexity result does not capture how hard it really is, since the same holds even for the practically much easier STRIPS fragment. A finer way to compare the difficulty of planning formalisms is through the notion of compilability, which has been however extensively studied only for classical planning by Nebel. This paper extends  Nebel’s framework to the setting of bounded numeric planning. First, we identify a variety of numeric fragments differing on the degree of the polynomials involved and the availability of features such as conditional effects and Boolean conditions; then we study the compilability of these fragments to each other and to the classical fragments. Surprisingly, numeric and classical planning with conditional effects and Boolean conditions can be compiled both ways preserving plan size exactly, while the same does not hold when targeting pure STRIPS. Our study reveals also that numeric fragments cluster into two equivalence classes separated by the availability of incomplete initial state specifications, a feature allowing to specify uncertainty in the initial state.
Knowledge Representation and Reasoning -> KRR: Knowledge compilation
List of keywords 
Planning and Scheduling -> PS: Theoretical foundations of planning Knowledge Representation and Reasoning -> KRR: Knowledge compilation
4383
Efficient and Equitable Deployment of Mobile Vaccine Distribution Centers
[+] More 
[-] Less 
Vaccines have proven to be extremely effective in preventing the spread of COVID-19 and potentially ending the pandemic. Lack of access caused many people not getting vaccinated early, so states such as Virginia deployed mobile vaccination sites in order to distribute vaccines across the state. Here we study the problem of deciding where these facilities should be placed and moved over time in order to minimize the distance each person needs to travel in order to be vaccinated. Traditional facility location models for this problem fail to incorporate the fact that our facilities are mobile (i.e., they can move over time). To this end, we instead model vaccine distribution as the Dynamic k-Supplier problem and give the first approximation algorithms for this problem. We then run extensive simulations on real world datasets to show the efficacy of our methods. In particular, we find that natural baselines for Dynamic k-Supplier cannot take advantage of the mobility of the facilities, and perform worse than non-mobile k-Supplier algorithms.
Machine Learning -> ML: Clustering
Search -> S: Combinatorial search and optimisation
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Resource allocation Machine Learning -> ML: Clustering
Search -> S: Combinatorial search and optimisation
4389
Differentially Private Partial Set Cover with Applications to Facility Location
[+] More 
[-] Less 
Set Cover is a fundamental problem in combinatorial optimization which has been studied for many decades due to its various applications across multiple domains. In many of these domains, the input data consists of locations, relationships, and other sensitive information of individuals which may leaked due to the set cover output. Attempts have been made to design privacy-preserving algorithms to solve the Set Cover under privacy constraints. Under differential privacy, it has been proved that the Set Cover problem has strong impossibility results and no explicit forms of the output can be released to the public.
In this work, we observe that these hardness results dissolve when we turn to the Partial Set Cover problem, where we only need to cover a ρ ∈ (0,1) fraction of the elements. We show that this relaxation enables us to avoid the impossibility results, and give the first algorithm which outputs an explicit form of set cover with non-trivial utility guarantees under differential privacy. Using our algorithm as a subroutine, we design a differentially private bicriteria algorithm to solve a recently proposed facility location problem for vaccine distribution which generalizes the k-supplier with outliers. Our analysis shows that relaxing the covering requirement to serve only a ρ ∈ (0,1) fraction of the population/universe also allows us to circumvent the inherent hardness of k-supplier and give the first non-trivial guarantees.
Data Mining -> DM: Privacy-preserving data mining
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Security and privacy Data Mining -> DM: Privacy-preserving data mining
4391
Self-Recover: Forecasting Block Maxima in Time Series from Predictors with Disparate Temporal Coverage Using Self-Supervised Learning
[+] More 
[-] Less 
Forecasting the block maxima of a future time window is a challenging task due to the difficulty in inferring the tail distribution of a target variable. As the historical observations alone may not be sufficient to train robust models to predict the block maxima, domain-driven process models are often available in many scientific domains to supplement the observation data and improve the forecast accuracy. Unfortunately, coupling the historical observations with process model outputs is a challenge due to their disparate temporal coverage. This paper presents Self-Recover, a deep learning framework to predict the block maxima of a time window by employing self-supervised learning to address the varying temporal data coverage problem. Specifically Self-Recover uses a combination of contrastive and generative self-supervised learning schemes along with a denoising autoencoder to impute the missing values. The framework also combines representations of the historical observations with process model outputs via a residual learning approach and learns the generalized extreme value (GEV) distribution characterizing the block maxima values. This enables the framework to reliably estimate the block maxima of each time window along with its confidence interval. Extensive experiments on real-world datasets demonstrate the superiority of Self-Recover compared to other state-of-the-art forecasting methods.
Machine Learning -> ML: Representation learning
Machine Learning -> ML: Self-supervised Learning
List of keywords 
Machine Learning -> ML: Time series and data streams Machine Learning -> ML: Representation learning
Machine Learning -> ML: Self-supervised Learning
4401
A Logic-based Approach to Contrastive Explainability for Neurosymbolic Visual Question Answering
[+] More 
[-] Less 
Visual Question Answering (VQA) is a well-known problem for which deep-learning is key. This poses a challenge for explaining answers to questions, the more if advanced notions like contrastive explanations (CEs) should be provided. The latter explain why an answer has been reached in contrast to a different one and are attractive as they focus on reasons necessary to flip a query answer. We present a CE framework for VQA that uses a neurosymbolic VQA architecture which disentangles perception from reasoning. Once the reasoning part is provided as logical theory, we use answer-set programming, in which CE generation can be framed as an abduction problem. We validate our approach on the CLEVR dataset, which we extend by more sophisticated questions to further demonstrate the robustness of the modular architecture. While we achieve top performance compared to related approaches, we can also produce CEs for explanation, model debugging, and validation tasks, showing the versatility of the declarative approach to reasoning.
Knowledge Representation and Reasoning -> KRR: Logic programming
Machine Learning -> ML: Explainable/Interpretable machine learning
List of keywords 
Machine Learning -> ML: Neuro-symbolic methods Knowledge Representation and Reasoning -> KRR: Logic programming
Machine Learning -> ML: Explainable/Interpretable machine learning
4402
Max Markov Chain
[+] More 
[-] Less 
In this paper, we introduce Max Markov Chain (MMC), a novel model for sequential data with sparse correlations among the state variables.
It may also be viewed as a special class of approximate models for High-order Markov Chains (HMCs). 
MMC is desirable for domains where the sparse correlations are long-term and vary in their temporal stretches. 
Although generally intractable, parameter optimization for MMC can be solved analytically. 
However, based on this result,
we derive an approximate solution that is highly efficient empirically.
When compared with HMC and approximate HMC models, MMC 
combines  better sample efficiency, model parsimony, and an outstanding computational advantage. 
Such a quality allows MMC to scale to large domains 
where the competing models would struggle to perform. 
We compare MMC with several baselines with synthetic and real-world datasets to demonstrate MMC as a valuable alternative for  stochastic modeling.
Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
Uncertainty in AI -> UAI: Tractable probabilistic models
List of keywords 
Uncertainty in AI -> UAI: Bayesian networks Uncertainty in AI -> UAI: Causality, structural causal models and causal inference
Uncertainty in AI -> UAI: Tractable probabilistic models
4407
Scalable Optimal Margin Distribution Machine
[+] More 
[-] Less 
Optimal margin Distribution Machine (ODM) is a newly proposed statistical learning framework rooting in the novel margin theory, which demonstrates better generalization performance than the traditional large margin based counterparts. Nonetheless, it suffers from the ubiquitous scalability problem regarding both computation time and memory as other kernel methods. This paper proposes a scalable ODM, which can achieve nearly ten times speedup compared to the original ODM training method. For nonlinear kernels, we propose a novel distribution-aware partition method to make the local ODM trained on each partition be close and converge faster to the global one. When linear kernel is applied, we extend a communication efficient SVRG method to accelerate the training further. Extensive empirical studies validate that our proposed method is highly computational efficient and almost never worsen the generalization.
Data Mining -> DM: Big data and scalability
Machine Learning -> ML: Kernel methods
List of keywords 
Machine Learning -> ML: Classification Data Mining -> DM: Big data and scalability
Machine Learning -> ML: Kernel methods
4413
Sequential Attention Source Identification Based on Feature Representation
[+] More 
[-] Less 
Snapshot observation based source localization has been widely studied due to its accessibility and low cost. However, the interaction of users in existing methods does not be addressed in time-varying infection scenarios. So these methods have a decreased accuracy in heterogeneous interaction scenarios. To solve this critical issue, this paper proposes a sequence-to-sequence based localization framework called Temporal-sequence based Graph Attention Source Identification (TGASI) based on an inductive learning idea. More specifically, the encoder focuses on generating multiple features by estimating the influence probability between two users, and the decoder distinguishes the importance of prediction sources in different timestamps by a designed temporal attention mechanism. It’s worth mentioning that the inductive learning idea ensures that TGASI can detect the sources in new scenarios without knowing other prior knowledge, which proves the scalability of TGASI. Comprehensive experiments with the SOTA methods demonstrate the higher detection performance and scalability in different scenarios of TGASI.
Data Mining -> DM: Networks
Machine Learning -> ML: Sequence and graph learning
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Web and social networks Data Mining -> DM: Networks
Machine Learning -> ML: Sequence and graph learning
4418
Relative Inconsistency Measures for Indefinite Databases with Denial Constraints
[+] More 
[-] Less 
Handling conflicting information is an important challenge in AI. Measuring inconsistency is an approach that provides ways to quantify the severity of inconsistency and helps understanding the primary sources of conflicts. In particular, a relative inconsistency measure computes, by some criteria, the proportion of the knowledge base that is inconsistent. In this paper we investigate relative inconsistency measures for indefinite  databases, which allow for indefinite or partial information which is formally expressed by means of disjunctive tuples. We introduce a postulate-based definition of relative inconsistency measure for indefinite databases with denial constraints, and investigate the compliance of some relative inconsistency measures with rationality postulates for indefinite databases as well as for the special case of definite databases. Finally, we investigate the complexity of the problem of computing the value of the proposed relative inconsistency measures as well as of the problems of deciding whether the inconsistency value is lower than, greater than, or equal to a given threshold for indefinite and definite databases.
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
List of keywords 
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
4419
Learning to Act for Perceiving in Partially Unknown Environments
[+] More 
[-] Less 
Autonomous agents embedded in a physical environment need the ability to correctly perceive the state of the environment from sensory data. In partially observable environments, certain properties can be perceived only in specific situations and from certain viewpoints that can be reached by the agent by planning and executing actions. For instance, to understand whether a cup is full of coffee, an agent, equipped with a camera, needs to turn on the light and look at the cup from the top. When the proper situations to perceive the desired properties are unknown, an agent needs to learn them and plan to get in such situations. In this paper, we devise a general method to solve this problem by evaluating the confidence of a neural network online and by using symbolic planning. We experimentally evaluate the proposed approach on several synthetic datasets, and show the feasibility of our approach in a real-world scenario that involves noisy perceptions and noisy actions on a real robot.
Planning and Scheduling -> PS: Learning in planning and scheduling
Robotics -> ROB: Perception
List of keywords 
Robotics -> ROB: Cognitive robotics Planning and Scheduling -> PS: Learning in planning and scheduling
Robotics -> ROB: Perception
4423
Computing Twin-width with SAT and Branch & Bound
[+] More 
[-] Less 
The graph width-measure twin-width recently attracted great attention because of its solving power and generality. Many prominent NP-hard problems are tractable on graphs of bounded twin-width if a certificate for the twin-width bound is provided as an input. Bounded twin-width subsumes other prominent structural restrictions such as bounded treewidth and bounded rank-width.
Computing such a  certificate is NP-hard itself, already for twin-width 4, and the only known implemented algorithm for twin-width computation is based on a SAT encoding.
In this paper, we propose two new algorithmic approaches for computing twin-width that
significantly improve the state of the art.
Firstly, we develop a SAT encoding that is far more compact than the known encoding and consequently scales to larger graphs. Secondly, we propose a new Branch & Bound algorithm for twin-width that, on many graphs, is significantly faster than the SAT encoding. It utilizes a sophisticated caching system for partial solutions.
Both algorithmic approaches are based on new conceptual insights into twin-width computation,
including the reordering of contractions.
Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
List of keywords 
Constraint Satisfaction and Optimization -> CSO: Constraint programming Constraint Satisfaction and Optimization -> CSO: Applications
Constraint Satisfaction and Optimization -> CSO: Constraint satisfaction
4428
GTR: A Grafting-Then-Reassembling Framework for Dynamic Scene Graph Generation
[+] More 
[-] Less 
Dynamic scene graph generation aims to identify visual relationships (subject-predicate-object) in frames based on spatio-temporal contextual information in the video. Previous work implicitly models the spatio-temporal interaction simultaneously, which leads to entanglement of spatio-temporal contextual information. To this end, we propose a Grafting-Then-Reassembling framework (GTR), which explicitly extracts intra-frame spatial information and inter-frame temporal information in two separate stages to decouple spatio-temporal contextual information. Specifically, we first graft a static scene graph generation model to generate static visual relationships within frames. Then we propose the temporal dependency model to extract the temporal dependencies across frames, and explicitly reassemble static visual relationships into dynamic scene graphs. Experimental results show that GTR achieves the state-of-the-art performance on Action Genome dataset. Further analyses reveal that the reassembling stage is crucial to the success of our framework.
Natural Language Processing -> NLP: Information extraction
List of keywords 
Computer Vision -> CV: Video analysis and understanding    Natural Language Processing -> NLP: Information extraction
4431
Preferences and Constraints in Abstract Argumentation
[+] More 
[-] Less 
In recent years there has been an increasing interest in extending Dung’s framework to facilitate the knowledge representation and reasoning process.
In this paper, we present an extension of Abstract Argumentation Framework (AF) that allows for the representation of preferences over arguments’ truth values (3-valued preferences).
For instance, we can express a preference stating that extensions where argument a is false (i.e. defeated) are preferred to extensions where argument b is false. 
Interestingly, such a framework generalizes the well-known Preference-based AF  with no additional cost in terms of computational complexity for most of the classical argumentation semantics.
Then, we further extend AF by considering both (3-valued) preferences and 3-valued constraints, that is constraints of the form \varphi \Rightarrow v or v \Rightarrow \varphi, where \varphi is a logical formula and v is a 3-valued truth value. 
After investigating the complexity of the resulting framework,as both constraints and preferences may represent subjective knowledge of agents, 
we extend our framework by considering multiple agents and study the complexity of deciding acceptance of arguments in this context.
List of keywords 
Knowledge Representation and Reasoning -> KRR: Argumentation 4438
Probabilistic Rule Induction from Event Sequences with Logical Summary Markov Models
[+] More 
[-] Less 
Event sequences are widely available across application domains and there is a long history of models for representing and analyzing such datasets. Summary Markov models are a recent addition to the literature that help identify the subset of event types that influence event types of interest to a user. In this paper, we introduce logical summary Markov models, which are a family of models for event sequences that enable interpretable predictions through logical rules that relate historical predicates to the probability of observing an event type at any arbitrary position in the sequence. We illustrate their connection to prior parametric summary Markov models as well as probabilistic logic programs, and propose new models from this family along with efficient greedy search algorithms for learning them from data. The proposed models outperform relevant baselines on most datasets in an empirical investigation on a probabilistic prediction task. We also compare the number of influencers that various logical summary Markov models learn on real-world datasets, and conduct a brief exploratory qualitative study to gauge the promise of such symbolic models around guiding large language models for predicting societal events.
Data Mining -> DM: Mining spatial and/or temporal data
Machine Learning -> ML: Time series and data streams
List of keywords 
Uncertainty in AI -> UAI: Tractable probabilistic models Data Mining -> DM: Mining spatial and/or temporal data
Machine Learning -> ML: Time series and data streams
4441
Multi-Robot Coordination and Layout Design for Automated Warehousing
[+] More 
[-] Less 
With the rapid progress in Multi-Agent Path Finding (MAPF), researchers have studied how MAPF algorithms can be deployed to coordinate hundreds of robots in large automated warehouses. While most works try to improve the throughput of such warehouses by developing better MAPF algorithms, we focus on improving the throughput by optimizing the warehouse layout. We show that, even with state-of-the-art MAPF algorithms, commonly used human-designed layouts can lead to congestion for warehouses with large numbers of robots and thus have limited scalability. We extend existing automatic scenario generation methods to optimize warehouse layouts. Results show that our optimized warehouse layouts (1) reduce traffic congestion and thus improve throughput, (2) improve the scalability of the automated warehouses by doubling the number of robots in some cases, and (3) are capable of generating layouts with user-specified diversity measures.
Planning and Scheduling -> PS: Applications
Search -> S: Evolutionary computation
List of keywords 
Robotics -> ROB: Multi-robot systems Planning and Scheduling -> PS: Applications
Search -> S: Evolutionary computation
4454
Fair and Efficient Allocation of Indivisible Chores with Surplus
[+] More 
[-] Less 
We study fair division of indivisible chores among n agents with additive disutility functions. Two well-studied fairness notions for indivisible items are envy-freeness up to one/any item (EF1/EFX) and the standard notion of economic efficiency is Pareto optimality (PO). There is a noticeable gap between the results known for both EF1 and EFX in the goods and chores settings. The case of chores turns out to be much more challenging. We reduce this gap by providing slightly relaxed versions of the known results on goods for the chores setting. Interestingly, our algorithms run in polynomial time, unlike their analogous versions in the goods setting.  
We introduce the concept of k surplus in the chores setting which means that up to k more chores are allocated to the agents and each of them is a copy of an original chore. We present a polynomial-time algorithm which gives EF1 and PO allocations with n-1 surplus. 
    
We relax the notion of EFX slightly and define tEFX which requires that the envy from agent i to agent j is removed upon the transfer of any chore from the i’s bundle to j’s bundle. We give a polynomial-time algorithm that in the chores case for 3 agents returns an allocation which is either proportional or tEFX. Note that proportionality is a very strong criterion in the case of indivisible items, and hence both notions we guarantee are desirable.
Agent-based and Multi-agent Systems -> MAS: Resource allocation
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division Agent-based and Multi-agent Systems -> MAS: Resource allocation
4461
Efficient Computation of General Modules for ALC Ontologies
[+] More 
[-] Less 
We present a method for extracting general modules for ontologies formulated in the description logic ALC. A module for an ontology is an ideally substantially smaller ontology that preserves all entailments for a user-specified set of terms. As such, it has applications such as ontology reuse and ontology analysis. Different from classical modules, general modules may use axioms not explicitly present in the input ontology, which allows for additional conciseness. So far, general modules have only been investigated for lightweight description logics.
We present the first work that considers the more expressive description logic ALC. In particular, our contribution is a new method based on uniform interpolation supported by some new theoretical results. Our evaluation indicates that our general modules are often smaller than classical modules and uniform interpolants computed by the state-of-the-art, and compared with uniform interpolants, can be computed in significantly shorter time. Moreover, our method can be used for, and in fact, improves the computation of uniform interpolants and classical modules.
List of keywords 
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies 4473
Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning
[+] More 
[-] Less 
Video text spotting (VTS) aims at extracting texts from videos, where text detection, tracking and recognition are conducted simultaneously. There have been some works that can tackle VTS; however, they may ignore the underlying semantic relationships among texts within a frame. We observe that the texts within a frame usually share similar semantics, which suggests that, if one text is predicted incorrectly by a text recognizer, it still has a chance to be corrected via semantic reasoning. In this paper, we propose an accurate video text spotter, VLSpotter, that reads texts visually, linguistically, and semantically. For ‘visually’, we propose a plug-and-play text-focused super-resolution module to alleviate motion blur and enhance video quality. For ‘linguistically’, a language model is employed to capture intra-text context to mitigate wrongly spelled text predictions. For ‘semantically’, we propose a text-wise semantic reasoning module to model inter-text semantic relationships and reason for better results. The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.
Computer Vision -> CV: Video analysis and understanding
List of keywords 
Computer Vision -> CV: Vision and language  Computer Vision -> CV: Video analysis and understanding
4476
Discrete Two Player All-Pay Auction with Complete Information
[+] More 
[-] Less 
We study discrete two player all-pay auction with complete information. We provide full characterization of mixed strategy Nash equilibria and show that they constitute a subset of Nash equilibria of discrete General Lotto game. We show that equilibria are not unique in general but they are interchangeable and sets of equilibrium strategies are convex. We also show that equilibrium payoffs are unique, unless valuation of at least one of the players is an even integer number. If equilibrium payoffs are not unique, continuum of equilibrium payoffs are possible.
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems Game Theory and Economic Paradigms -> GTEP: Noncooperative games
4480
Adversarial Contention Resolution Games
[+] More 
[-] Less 
We study contention resolution (CR) on a shared channel modelled as a game with selfish players. There are n agents and the adversary chooses some k smaller than n of them as players. Each participating player in a CR game has a packet to transmit. A transmission is successful if it is performed as the only one at a round. Each player aims to minimize its packet latency. We introduce the notion of adversarial equilibrium (AE), which incorporates adversarial selection of players. We develop efficient deterministic communication algorithms that are also AE. We characterize the price of anarchy in the CR games with respect to AE.
Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Game Theory and Economic Paradigms -> GTEP: Mechanism design
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Noncooperative games Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Game Theory and Economic Paradigms -> GTEP: Mechanism design
4482
Fair Division of a Graph into Compact Bundles
[+] More 
[-] Less 
We study the computational complexity of fair division of indivisible items in an enriched model: there is an underlying graph on the set of items. And we have to allocate the items (i.e., the vertices of the graph) to a set of agents in such a way that (a) the allocation is fair (for appropriate notions of fairness) and (b) each agent receives a bundle of items (i.e., a subset of vertices) that induces a subgraph with a specific “nice structure.” This model has previously been studied in the literature with the nice structure being a connected subgraph. In this paper, we propose an alternative for connectivity in fair division. We introduce compact graphs, and look for fair allocations in which each agent receives a compact bundle of items. Through compactness, we attempt to capture the idea that every agent must receive a bundle of “closely related” items. We prove a host of hardness and tractability results with respect to fairness concepts such as proportionality, envy-freeness and maximin share guarantee.
Agent-based and Multi-agent Systems -> MAS: Resource allocation
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Fair division Agent-based and Multi-agent Systems -> MAS: Resource allocation
4484
On a Voter Model with Context-Dependent Opinion Adoption
[+] More 
[-] Less 
Opinion diffusion is a crucial phenomenon in social networks, often underlying the way in which a collection of agents develops a consensus on relevant decisions.  Voter models are well-known theoretical models to study opinion spreading in social networks and structured populations. Their simplest version assumes that an updating agent will adopt the opinion of a neighboring agent chosen at random. These models allow us to study, for example, the probability that a certain opinion will fixate into a consensus opinion, as well as the expected time it takes for a consensus opinion to emerge. 
Standard voter models are oblivious to the opinions held by the agents involved in the opinion adoption process. We propose and study a context-dependent opinion spreading process on an arbitrary social graph, in which the probability that an agent abandons opinion a in favor of opinion b depends on both a and b. We discuss the relations of the model with existing voter models and then derive theoretical results for both the fixation probability and the expected consensus time for two opinions, for both the synchronous and the asynchronous update models.
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Agent theories and models Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
4492
Incentivizing Recourse through Auditing in Strategic Classification
[+] More 
[-] Less 
The increasing automation of high-stakes decisions with direct impact on the lives and well-being of individuals raises a number of important considerations. Prominent among these is strategic behavior by individuals hoping to achieve a more desirable outcome. Two forms of such behavior are commonly studied: 1) misreporting of individual attributes, and 2) recourse, or actions that truly change such attributes. The former involves deception, and is inherently undesirable, whereas the latter may well be a desirable goal insofar as it changes true individual qualification. We study misreporting and recourse as strategic choices by individuals within a unified framework. In particular, we propose auditing as a means to incentivize recourse actions over attribute manipulation, and characterize optimal audit policies for two types of principals, utility-maximizing and recourse-maximizing. Additionally, we consider subsidies as an incentive for recourse over manipulation, and show that even a utility-maximizing principal would be willing to devote a considerable amount of audit budget to providing such subsidies. Finally, we consider the problem of optimizing fines for failed audits, and bound the total cost incurred by the population as a result of audits.
AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Game Theory and Economic Paradigms -> GTEP: Other
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Societal impact of AI AI Ethics, Trust, Fairness -> ETF: Safety and robustness
Game Theory and Economic Paradigms -> GTEP: Other
4504
Explaining Answer-Set Programs with Abstract Constraint Atoms
[+] More 
[-] Less 
Answer-Set Programming (ASP) is a popular declarative reasoning and problem solving formalism. Due to the increasing interest in explainabilty, several explanation approaches have been developed for ASP. However, support for commonly used advanced language features of ASP, as for example aggregates or choice rules, is still mostly lacking. We deal with explaining ASP programs containing Abstract Constraint Atoms, which encompass the above features and others. We provide justifications for the presence, or absence, of an atom in a given answer-set. To this end, we introduce several formal notions of justification in this setting based on the one hand on a semantic characterisation utilising minimal partial models, and on the other hand on a more ruled-guided approach. We provide complexity results for checking and computing such justifications, and discuss how the semantic and syntactic approaches relate and can be jointly used to offer more insight.
Our results contribute to a basis for explaining commonly used language features and thus increase accessibility and usability of ASP as an AI tool.
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning
List of keywords 
Knowledge Representation and Reasoning -> KRR: Logic programming Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning
4505
Sequence Learning Using Equilibrium Propagation
[+] More 
[-] Less 
Equilibrium Propagation (EP) is a powerful and more bio-plausible alternative to conventional learning frameworks such as backpropagation. The effectiveness of EP stems from the fact that it relies only on local computations and requires solely one kind of computational unit during both of its training phases, thereby enabling greater applicability in domains such as bio-inspired neuromorphic computing. The dynamics of the model in EP is governed by an energy function and the internal states of the model consequently converge to a steady state following the state transition rules defined by the same. However, by definition, EP requires the input to the model (a convergent RNN) to be static in both the phases of training. Thus it is not possible to design a model for sequence classification using EP with an LSTM or GRU like architecture. In this paper, we leverage recent developments in modern hopfield networks to further understand energy based models and develop solutions for complex sequence classification tasks using EP while satisfying its convergence criteria and maintaining its theoretical similarities with recurrent backpropagation. We explore the possibility of integrating modern hopfield networks as an attention mechanism with convergent RNN models used in EP, thereby extending its applicability for the first time on two different sequence classification tasks in natural language processing viz. sentiment analysis (IMDB dataset) and natural language inference (SNLI dataset). Our implementation source code is available at https://github.com/NeuroCompLab-psu/EqProp-SeqLearning.
Machine Learning -> ML: Attention models
List of keywords 
Humans and AI -> HAI: Cognitive systems Machine Learning -> ML: Attention models
4516
A Comparative Study of Ranking Formulas Based on Consistency
[+] More 
[-] Less 
Ranking is ubiquitous in everyday life. This paper is concerned with the problem of ranking information of a knowledge base when this latter is possibly inconsistent. In particular, the key issue is to elicit a plausibility order on the formulas in an inconsistent knowledge base. We show how such ordering can be obtained by using only the inherent structure of the knowledge base. We start by introducing a principled way a reasonable ranking framework for formulas should satisfy. Then, a variety of ordering criteria have been explored to define plausibility order over formulas based on consistency. Finally, we study the behaviour of the different formula ranking semantics in terms of the proposed logical postulates as well as their (in)-compatibility.
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
List of keywords 
Knowledge Representation and Reasoning -> KRR: Learning and reasoning Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
4518
Group Fairness in Set Packing Problems
[+] More 
[-] Less 
Kidney exchange programs (KEPs) typically seek to match incompatible patient-donor pairs based on a utilitarian objective where the number or overall quality of transplants is maximized—implicitly penalizing certain classes of difficult to match (e.g., highly-sensitized) patients. Prioritizing the welfare of highly-sensitized (hard-to-match) patients has been studied as a natural \textit{fairness} criterion. 
We formulate the KEP problem as $k$-set packing with a probabilistic group fairness notion of proportionality fairness—namely, fair $k$-set packing (\f{}). In this work we propose algorithms that take arbitrary proportionality vectors (i.e., policy-informed demands of how to prioritize different groups) and return a probabilistically fair solution with provable guarantees. Our main contributions are randomized algorithms as well as hardness results for \f{} variants. Additionally, the tools we introduce serve to audit the price of fairness involved in prioritizing different groups in realistic KEPs and other $k$-set packing applications. We conclude with experiments on synthetic and realistic kidney exchange \textsc{FairSP} instances.
Game Theory and Economic Paradigms -> GTEP: Other
Search -> S: Combinatorial search and optimisation
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity Game Theory and Economic Paradigms -> GTEP: Other
Search -> S: Combinatorial search and optimisation
4520
Discounting in Strategy Logic
[+] More 
[-] Less 
Discounting is an important dimension in multi-agent systems as long as we want to reason about strategies and time. It is a key aspect in economics as it captures the intuition that the far-away future is not as important as the near future. Traditional verification techniques allow to check whether there is a winning strategy for a group of agents but they do not take into account the fact that satisfying a goal sooner is different from satisfying it after a long wait. 
In this paper, we augment Strategy Logic with future discounting over a set of discounted functions D, denoted SL[D]. We consider “until” operators with discounting functions: the satisfaction value of a specification in SL[D] is a value in [0, 1], where the longer it takes to fulfill requirements, the smaller the satisfaction value is. We motivate our approach with classical examples from Game Theory and study the complexity of model-checking SL[D]-formulas.
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning
Knowledge Representation and Reasoning -> KRR: Qualitative, geometric, spatial, and temporal reasoning
4526
Convergence in Multi-Issue Iterative Voting under Uncertainty
[+] More 
[-] Less 
We study strategic behavior in iterative plurality voting for multiple issues under uncertainty. We introduce a model synthesizing simultaneous multi-issue voting with local dominance theory, in which agents repeatedly update their votes based on sets of vote profiles they deem possible, and determine its convergence properties. After demonstrating that local dominance improvement dynamics may fail to converge, we present two sufficient model refinements that guarantee convergence from any initial vote profile for binary issues: constraining agents to have O-legal preferences, where issues are ordered by importance, and endowing agents with less uncertainty about issues they are modifying than others. Our empirical studies demonstrate that while cycles are common for agents without uncertainty, introducing uncertainty makes convergence almost guaranteed in practice.
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice 4527
Description Logics with Pointwise Circumscription
[+] More 
[-] Less 
Circumscription is one of the most powerful ways to extend Description Logics (DLs) with non-monotonic reasoning features, albeit with huge computational costs and undecidability in many cases. In this paper, we introduce pointwise circumscription for DLs, which is not only intuitive in terms of knowledge representation, but also provides a sound approximation of classic circumscription and has reduced computational complexity. Our main idea is to replace the second-order quantification step of classic circumscription with a series of (pointwise) local checks on all domain elements and their immediate neighbourhood. Our main positive results are for ontologies in DLs ALCIO and ALCI: we prove that for TBoxes of modal depth 1 (i.e. without nesting of existential or universal quantifiers) standard reasoning problems under pointwise circumscription are (co)NExpTime-complete and ExpTime-complete, respectively. The restriction of modal depth still yields a large class of ontologies useful in practice, and it is further justified by a strong undecidability result for pointwise circumscription with general TBoxes in ALCIO.
Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning
List of keywords 
Knowledge Representation and Reasoning -> KRR: Description logics and ontologies Knowledge Representation and Reasoning -> KRR: Non-monotonic reasoning
4557
Computing (1+epsilon)-Approximate Degeneracy in Sublinear Time
[+] More 
[-] Less 
The problem of finding the degeneracy of a graph is a subproblem of the k-core decomposition problem. In this paper, we present a (1 + epsilon)-approximate solution to the degeneracy problem which runs in O(n log n) time, sublinear in the input size for dense graphs, by sampling a small number of neighbors adjacent to high degree nodes. This improves upon the previous work on sublinear approximate degeneracy, which implies a (4 + epsilon)-approximate ~O(n) solution. Our algorithm can be extended to an approximate O(n log n) time solution to the k-core decomposition problem. We also explore the use of our approximate algorithm as a technique for speeding up exact degeneracy computation. We prove theoretical guarantees of our algorithm and provide optimizations, which improve the running time of our algorithm in practice. Experiments on massive real-world web graphs show that our algorithm performs significantly faster than previous methods for computing degeneracy.
Multidisciplinary Topics and Applications -> MDA: Web and social networks
List of keywords 
Data Mining -> DM: Mining graphs Multidisciplinary Topics and Applications -> MDA: Web and social networks
4572
Participatory Budgeting with Multiple Degrees of Projects and Ranged Approval Votes
[+] More 
[-] Less 
In an indivisible participatory budgeting (PB) framework, we have a limited budget that is to be distributed among a set of projects, by aggregating the preferences of voters for the projects. All the prior work on indivisible PB assumes that each project has only one possible cost. In this work, we let each project have a set of permissible costs, each reflecting a possible degree of sophistication of the project. Each voter approves a range of costs for each project, by giving an upper and lower bound on the cost that she thinks the project deserves. The outcome of a PB rule selects a subset of projects and also specifies their corresponding costs. We study different utility notions and prove that the existing positive results when every project has exactly one permissible cost can also be extended to our framework where a project has several permissible costs. We also analyze the fixed parameter tractability of the problem. Finally, we propose some important and intuitive axioms and analyze their satisfiability by different PB rules. We conclude by making some crucial remarks.
Agent-based and Multi-agent Systems -> MAS: Resource allocation
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice Agent-based and Multi-agent Systems -> MAS: Resource allocation
4580
Multi-objective Search via Lazy and Efficient Dominance Checks
[+] More 
[-] Less 
Multi-objective search can be used to model many real-world problems that require finding Pareto optimal paths from a specified start state to a specified goal state, while considering different costmetrics such as distance, time, and fuel. The performance of multi-objective search can be improved by making dominance checking—an operation necessary to determine whether or not a path dominates another—more efficient. This was shown in practice by BOA*, a state-of-the-art bi-objective search algorithm, which outperforms previously existing bi-objective search algorithms in part because it adopts a lazy approach towards dominance checking. EMOA*, a recent multi-objective search algorithm, generalizes BOA* to more-than-two objectives using AVL trees for dominance checking.
In this paper, we first propose Linear-Time Multi-Objective A* (LTMOA*), an multi-objective search algorithm that implements a more efficient dominance checking than EMOA* using simple data structures like arrays. We then propose an even lazier approach towards dominance checking, and the resulting algorithm, LazyLTMOA*, distinguishes from EMOA* and LTMOA* by removing the dominance checking during node generation. Our experimental results show that LazyLTMOA* outperforms EMOA* by up to an order of magnitude in terms of runtime.
Robotics -> ROB: Motion and path planning
Search -> S: Combinatorial search and optimisation
List of keywords 
Search -> S: Heuristic search Robotics -> ROB: Motion and path planning
Search -> S: Combinatorial search and optimisation
4586
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
[+] More 
[-] Less 
While neural text-to-speech (TTS) has achieved human-like natural synthetic speech, multilingual TTS systems are limited to resource-rich languages due to the need for paired text and studio-quality audio data. This paper proposes a method for zero-shot multilingual TTS using text-only data for the target language. The use of text-only data allows the development of TTS systems for low-resource languages for which only textual resources are available, making TTS accessible to thousands of languages. Inspired by the strong cross-lingual transferability of multilingual language models, our framework first performs masked language model pretraining with multilingual text-only data. Then we train this model with a paired data in a supervised manner, while freezing a language-aware embedding layer. This allows inference even for languages not included in the paired data but present in the text-only data. Evaluation results demonstrate highly intelligible zero-shot TTS with a character error rate of less than 12% for an unseen language.
Natural Language Processing -> NLP: Language models
List of keywords 
Natural Language Processing -> NLP: Speech Natural Language Processing -> NLP: Language models
4601
A Mathematical Runtime Analysis of the Non-dominated Sorting Genetic Algorithm III (NSGA-III)
[+] More 
[-] Less 
The Non-dominated Sorting Genetic Algorithm II (NSGA-II) is the most prominent multi-objective evolutionary algorithm for real-world applications.
While it performs evidently well on bi-objective optimization problems, empirical studies suggest that it is less effective when applied to problems with more than two objectives. A recent mathematical runtime analysis confirmed this observation by proving the NGSA-II for an exponential number of iterations misses a constant factor of the Pareto front of the simple 3-objective OneMinMax problem.
In this work, we provide the first mathematical runtime analysis of the NSGA-III, a refinement of the NSGA-II aimed at better handling more than two objectives. 
We prove that the NSGA-III with sufficiently many reference points – a small constant factor more than the size of the Pareto front, as suggested for this algorithm – computes the complete Pareto front of the 3-objective OneMinMax benchmark in an expected number of O(n log n) iterations. This result holds for all population sizes (that are at least the size of the Pareto front). It shows a drastic advantage of the NSGA-III over the NSGA-II on this benchmark. The mathematical arguments used here and in the previous work on the NSGA-II suggest that similar findings are likely for other benchmarks with three or more objectives.
Search -> S: Heuristic search
List of keywords 
Search -> S: Evolutionary computation Search -> S: Heuristic search
4607
Probabilistic Temporal Logic for Reasoning about Bounded Policies
[+] More 
[-] Less 
To build a theory of intention revision for agents operating in stochastic environments, we need a logic in which we can explicitly reason about their decision-making policies and those policies’ uncertain outcomes. Towards this end, we propose PLBP, a novel probabilistic temporal logic for Markov Decision Processes that allows us to reason about policies of bounded size. The logic is designed so that its expressive power is sufficient for the intended applications, whilst at the same time possessing strong computational properties. We prove that the satisfiability problem for our logic is decidable, and that its model checking problem is PSPACE-complete. This allows us to e.g. algorithmically verify whether an agent’s intentions are coherent, or whether a specific policy satisfies safety and/or liveness properties.
Agent-based and Multi-agent Systems -> MAS: Agent theories and models
List of keywords 
Knowledge Representation and Reasoning -> KRR: Reasoning about actions Agent-based and Multi-agent Systems -> MAS: Agent theories and models
4629
REPLACE: A Logical Framework for Combining Collective Entity Resolution and Repairing
[+] More 
[-] Less 
This paper considers the problem of querying dirty databases, which may contain both erroneous facts and multiple names for the same entity. While both of these data quality issues have been widely studied in isolation, our contribution is a holistic framework for jointly deduplicating and repairing data. Our REPLACE framework follows a declarative approach, utilizing logical rules to specify under which conditions a pair of entity references can or must be merged and logical constraints to specify consistency requirements. The semantics defines a space of solutions, each consisting of a set of merges to perform and a set of facts to delete, which can be further refined by applying optimality criteria. As there may be multiple optimal solutions, we use classical notions of possible and certain query answers to reason over the alternative solutions, and introduce a novel notion of most informative answer to obtain a more compact presentation of query results. We perform a detailed analysis of the data complexity of the central reasoning tasks of recognizing optimal solutions and (most informative) possible and certain answers, for each of the three notions of optimal solution and for both general and restricted specifications.
Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Multidisciplinary Topics and Applications -> MDA: Databases
List of keywords 
Knowledge Representation and Reasoning -> KRR: Computational complexity of reasoning Knowledge Representation and Reasoning -> KRR: Knowledge representation languages
Multidisciplinary Topics and Applications -> MDA: Databases
4636
Pseudo-Labeling Enhanced by Privileged Information and Its Application to In Situ Sequencing Images
[+] More 
[-] Less 
Various strategies for label-scarce object detection have been explored by the computer vision research community. These strategies mainly rely on assumptions that are specific to natural images and not directly applicable to the biological and biomedical vision domains. For example, most semi-supervised learning strategies rely on a small set of labeled data as a confident source of ground truth. In many biological vision applications, however, the ground truth is unknown and indirect information might be available in the form of noisy estimations or orthogonal evidence. In this work, we frame a crucial problem in spatial transcriptomics – decoding barcodes from In-Situ-Sequencing (ISS) images – as a semi-supervised object detection (SSOD) problem. Our proposed framework incorporates additional available sources of information into a semi-supervised learning framework in the form of privileged information. The privileged information is incorporated into the teacher’s pseudo-labeling in a teacher-student self-training iteration. Although the available privileged information could be data domain specific, we have introduced a general strategy of pseudo-labeling enhanced by privileged information (PLePI) and exemplified the concept using ISS images, 
as well on the COCO benchmark using extra evidence provided by CLIP.
Multidisciplinary Topics and Applications -> MDA: Bioinformatics
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Life sciences Multidisciplinary Topics and Applications -> MDA: Bioinformatics
4650
NeuPSL: Neural Probabilistic Soft Logic
[+] More 
[-] Less 
In this paper, we introduce Neural Probabilistic Soft Logic (NeuPSL), a novel neuro-symbolic (NeSy) framework that unites state-of-the-art symbolic reasoning with the low-level perception of deep neural networks. To model the boundary between neural and symbolic representations, we propose a family of energy-based models, NeSy Energy-Based Models, and show that they are general enough to include NeuPSL and many other NeSy approaches. Using this framework, we show how to seamlessly integrate neural and symbolic parameter learning and inference in NeuPSL. Through an extensive empirical evaluation, we demonstrate the benefits of using NeSy methods, achieving upwards of 30% improvement over independent neural network models. On a well-established NeSy task, MNIST-Addition, NeuPSL demonstrates its joint reasoning capabilities by outperforming existing NeSy approaches by up to 10% in low-data settings. Furthermore, NeuPSL achieves a 5% boost in performance over state-of-the-art NeSy methods in a canonical citation network task with up to a 40 times speed up.
Machine Learning -> ML: Structured prediction
List of keywords 
Machine Learning -> ML: Symbolic methods Machine Learning -> ML: Structured prediction
4651
Modeling with Homophily Driven Heterogeneous Data in Gossip Learning
[+] More 
[-] Less 
Training deep learning models on data distributed and local to edge devices such as mobile phones is a prominent recent research direction. In a Gossip Learning (GL) system, each participating device maintains a model trained on its local data and iteratively aggregates it with the models from its neighbours in a communication network. While the fully distributed operation in GL comes with natural advantages over the centralized orchestration in Federated Learning (FL), its convergence becomes particularly slow when the data distribution is heterogeneous and aligns with the clustered structure of the communication network. These characteristics are pervasive across practical applications as people with similar interests (thus producing similar data) tend to create communities.
This paper proposes a data-driven neighbor weighting strategy for aggregating the models: this enables faster diffusion of knowledge across the communities in the network and leads to quicker convergence. We augment the method to make it computationally efficient and fair: the devices quickly converge to the same model. We evaluate our model on real and synthetic datasets that we generate using a novel generative model for communication networks with heterogeneous data. Our exhaustive empirical evaluation  verifies that our proposed method attains a faster convergence rate than the baselines. For example, the median test accuracy for a decentralized bird image classifier application reaches 81% with our proposed method within 80 rounds, whereas the baseline only reaches 46%.
List of keywords 
Machine Learning -> ML: Federated learning 4653
Hierarchical Apprenticeship Learning for Disease Progression Modeling
[+] More 
[-] Less 
Disease progression modeling (DPM) plays an essential role in characterizing patients’ historical pathways and predicting their future risks. Apprenticeship learning (AL) aims to induce decision-making policies by observing and imitating expert behaviors. In this paper, we investigate the incorporation of AL-derived patterns into DPM, utilizing a Time-aware Hierarchical EM Energy-based Subsequence (THEMES) AL approach. To the best of our knowledge, this is the first study incorporating AL-derived progressive and interventional patterns for DPM. We evaluate the efficacy of this approach in a challenging task of septic shock early prediction, and our results demonstrate that integrating the AL-derived patterns significantly enhances the performance of DPM.
Data Mining -> DM: Applications
Multidisciplinary Topics and Applications -> MDA: Health and medicine
List of keywords 
Data Mining -> DM: Mining spatial and/or temporal data Data Mining -> DM: Applications
Multidisciplinary Topics and Applications -> MDA: Health and medicine
4655
Temporal Network Creation Games
[+] More 
[-] Less 
Most networks are not static objects, but instead they change over time. This observation has sparked rigorous research on temporal graphs within the last years. In temporal graphs, we have a fixed set of nodes and the connections between them are only available at certain time steps. This gives rise to a plethora of algorithmic problems on such graphs, most prominently the problem of finding temporal spanners, i.e., the computation of subgraphs that guarantee all pairs reachability via temporal paths. To the best of our knowledge, only centralized approaches for the solution of this problem are known. However, many real-world networks are not shaped by a central designer but instead they emerge and evolve by the interaction of many strategic agents. This observation is the driving force of the recent intensive research on game-theoretic network formation models.      
In this work we bring together these two recent research directions: temporal graphs and game-theoretic network formation. As a first step into this new realm, we focus on a simplified setting where a complete temporal host graph is given and the agents, corresponding to its nodes, selfishly create incident edges to ensure that they can reach all other nodes via temporal paths in the created network. This yields temporal spanners as equilibria of our game. We prove results on the convergence to and the existence of equilibrium networks, on the complexity of finding best agent strategies, and on the quality of the equilibria. By taking these first important steps, we uncover challenging open problems that call for an in-depth exploration of the creation of temporal graphs by strategic agents.
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Noncooperative games 4658
Ties in Multiwinner Approval Voting
[+] More 
[-] Less 
We study the complexity of deciding if there is a tie in a given approval-based multiwinner election, as well as the complexity of counting tied winning committees. We consider a family of Thiele rules, their greedy variants, Phragmen’s sequential rule, and Method of Equal Shares. For most cases, our problems are computationally hard, but for sequential rules we find an FPT algorithm for discovering ties (parameterized by the committee size). We also show experimentally that in elections of moderate size ties are quite frequent.
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice 4663
Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning
[+] More 
[-] Less 
Practical uses of Artificial Intelligence (AI) in the real world have demonstrated the importance of embedding moral choices into intelligent agents. They have also highlighted that defining top-down ethical constraints on AI according to any one type of morality is extremely challenging and can pose risks. A bottom-up learning approach may be more appropriate for studying and developing ethical behavior in AI agents. In particular, we believe that an interesting and insightful starting point is the analysis of emergent behavior of Reinforcement Learning (RL) agents that act according to a predefined set of moral rewards in social dilemmas.
In this work, we present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories. We aim to design reward structures that are simplified yet representative of a set of key ethical systems. Therefore, we first define moral reward functions that distinguish between consequence- and norm-based agents, between morality based on societal norms or internal virtues, and between single- and mixed-virtue (e.g., multi-objective) methodologies. Then, we evaluate our approach by modeling repeated dyadic interactions between learning moral agents in three iterated social dilemma games (Prisoner’s Dilemma, Volunteer’s Dilemma and Stag Hunt). We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation, and the corresponding social outcomes. Finally, we discuss the implications of these findings for the development of moral agents in artificial and mixed human-AI societies.
AI Ethics, Trust, Fairness -> ETF: Moral decision making
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning AI Ethics, Trust, Fairness -> ETF: Moral decision making
4665
Cross-community Adapter Learning (CAL) to Understand the Evolving Meanings of Norm Violation
[+] More 
[-] Less 
Cross-community learning incorporates data from different sources to leverage task-specific solutions in a target community. This approach is particularly interesting for low-resource or newly created online communities, where data formalizing interactions between agents (community members) are limited. In such scenarios, a normative system that intends to regulate online interactions faces the challenge of continuously learning the meaning of norm violation as communities’ views evolve, either with changes in the understanding of what it means to violate a norm or with the emergence of new violation classes. To address this issue, we propose the Cross-community Adapter Learning (CAL) framework, which combines adapters and transformer-based models to learn the meaning of norm violations expressed as textual sentences. Additionally, we analyze the differences in the meaning of norm violations between communities, using Integrated Gradients (IG) to understand the inner workings of our model and calculate a global relevance score that indicates the relevance of words for violation detection. Results show that cross-community learning enhances CAL’s performance while explaining the differences in the meaning of norm-violating behavior based on community members’ feedback. We evaluate our proposal in a small set of interaction data from Wikipedia, in which the norm prohibits hate speech.
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Incremental learning
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Normative systems AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Incremental learning
4671
Beyond Strict Competition: Approximate Convergence of Multi-agent Q-Learning Dynamics
[+] More 
[-] Less 
The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum assumption.
Motivated by this, we study a smooth variant of Q-Learning, a popular reinforcement learning dynamics which balances the agents’ tendency to maximise their payoffs with their propensity to explore the state space. We examine this dynamic in games which are `close’ to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by the `distance’ to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest’ network zero-sum game can be found efficiently. Importantly, our theoretical guarantees are widely applicable in different game settings, regardless of whether the dynamics ultimately reach an equilibrium, or remain non convergent.
Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Machine Learning -> ML: Reinforcement learning
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning Game Theory and Economic Paradigms -> GTEP: Noncooperative games
Machine Learning -> ML: Reinforcement learning
4672
Schelling Games with Continuous Types
[+] More 
[-] Less 
In most major cities and urban areas, residents form homogeneous neighborhoods along ethnic or socioeconomic lines. This phenomenon is widely known as residential segregation and has been studied extensively. Fifty years ago, Schelling proposed a landmark model that explains residential segregation in an elegant agent-based way. A recent stream of papers analyzed Schelling’s model using game-theoretic approaches. However, all these works considered models with a given number of discrete types modeling different ethnic groups.
We focus on segregation caused by non-categorical attributes, such as household income or position in a political left-right spectrum. For this, we consider agent types that can be represented as real numbers. This opens up a great variety of reasonable models and, as a proof of concept, we focus on several natural candidates. In particular, we consider agents that evaluate their location by the average type-difference or the maximum type-difference to their neighbors, or by having a certain tolerance range for type-values of neighboring agents.We study the existence and computation of equilibria and provide bounds on the Price of Anarchy and Stability. Also, we present simulation results that compare our models and shed light on the obtained equilibria for our variants.
Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Noncooperative games Agent-based and Multi-agent Systems -> MAS: Agent theories and models
Agent-based and Multi-agent Systems -> MAS: Agent-based simulation and emergence
4691
Ordinal Hedonic Seat Arrangement under Restricted Preference Domains: Swap Stability and Popularity
[+] More 
[-] Less 
We study a variant of hedonic games, called hedonic seat arrangements in the literature, where the goal is not to partition the agents into coalitions but to assign them to vertices of a given graph; their satisfaction is then based on the subset of agents in their neighborhood. We focus on ordinal hedonic seat arrangements where the preferences over neighborhoods are deduced from ordinal preferences over single agents and a given preference extension. In such games and for different types of preference restrictions and extensions, we investigate the existence of arrangements satisfying stability w.r.t. swaps of positions in the graph or the well-known optimality concept of popularity.
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Resource allocation
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Resource allocation
4714
BRExIt: On Opponent Modelling in Expert Iteration
[+] More 
[-] Less 
Finding a best response policy is a central objective in game theory and multi-agent learning, with modern population-based training approaches employing reinforcement learning algorithms as best-response oracles to improve play against candidate opponents (typically previously learnt policies). We propose Best Response Expert Iteration (BRExIt), which accelerates learning in games by incorporating opponent models into the state-of-the-art learning algorithm Expert Iteration (ExIt). BRExIt aims to (1) improve feature shaping in the apprentice, with a policy head predicting opponent policies as an auxiliary task, and (2) bias opponent moves in planning towards the given or learnt opponent model, to generate apprentice targets that better approximate a best response. In an empirical ablation on BRExIt’s algorithmic variants against a set of fixed test agents, we provide statistical evidence that BRExIt learns better performing policies than ExIt. Code available at: https://github.com/Danielhp95/on-opponent-modelling-in-expert-iteration-code. Supplementary material available
at https://arxiv.org/abs/2206.00113.
Machine Learning -> ML: Reinforcement learning
Search -> S: Game playing
List of keywords 
Machine Learning -> ML: Deep reinforcement learning Machine Learning -> ML: Reinforcement learning
Search -> S: Game playing
4724
A Symbolic Approach to Computing Disjunctive Association Rules from Data
[+] More 
[-] Less 
Association rule mining is one of the well-studied and most important knowledge discovery task in data mining. In this paper, we first introduce the k-disjunctive support based itemset, a generalization of the traditional model of itemset by allowing the absence of up to k items in each transaction matching the itemset. Then, to discover more expressive rules from data, we define the concept of (k, k′)-disjunctive support based association rules by considering the antecedent and the consequent of the rule as k-disjunctive and k′-disjunctive support based itemsets, respectively. Second, we provide a polynomial-time reduction of both the problems of mining k-disjunctive support based itemsets and (k, k′)-disjunctive support based association rules to the propositional satisfiability model enumeration task. Finally, we show through an extensive campaign of experiments on several popular real-life datasets the efficiency of our proposed approach
Constraint Satisfaction and Optimization -> CSO: Modeling
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
List of keywords 
Data Mining -> DM: Frequent pattern mining Constraint Satisfaction and Optimization -> CSO: Modeling
Constraint Satisfaction and Optimization -> CSO: Satisfiabilty
4730
Spatially Covariant Lesion Segmentation
[+] More 
[-] Less 
Compared to natural images, medical images usually show stronger visual patterns and therefore this adds flexibility and elasticity to resource-limited clinical applications by injecting proper priors into neural networks.
In this paper, we propose spatially covariant pixel-aligned classifier (SCP) to improve the computational efficiency and meantime maintain or increase accuracy for lesion segmentation.
SCP relaxes the spatial invariance constraint imposed by convolutional operations and optimizes an underlying implicit function that maps image coordinates to network weights, the parameters of which are obtained along with the backbone network training and later used for generating network weights to capture spatially covariant contextual information. 
We demonstrate the effectiveness and efficiency of the proposed SCP using two lesion segmentation tasks from different imaging modalities: white matter hyperintensity segmentation in magnetic resonance imaging and liver tumor segmentation in contrast-enhanced abdominal computerized tomography.
The network using SCP has achieved 23.8, 64.9 and 74.7 reduction in GPU memory usage, FLOPs, and network size with similar or better accuracy for lesion segmentation.
Computer Vision -> CV: Biomedical image analysis
Machine Learning -> ML: Convolutional networks
List of keywords 
Computer Vision -> CV: Segmentation Computer Vision -> CV: Biomedical image analysis
Machine Learning -> ML: Convolutional networks
4732
On Discovering Interesting Combinatorial Integer Sequences
[+] More 
[-] Less 
We study the problem of generating interesting integer sequences with a combinatorial interpretation. For this we introduce a two-step approach. In the first step, we generate first-order logic sentences which define some combinatorial objects, e.g., undirected graphs, permutations, matchings etc. In the second step, we use algorithms for lifted first-order model counting to generate integer sequences that count the objects encoded by the first-order logic formulas generated in the first step. For instance, if the first-order sentence defines permutations then the generated integer sequence is the sequence of factorial numbers n!. We demonstrate that our approach is able to generate interesting new sequences by showing that a non-negligible fraction of the automatically generated sequences can actually be found in the Online Encyclopaedia of Integer Sequences (OEIS) while generating many other similar sequences which are not present in OEIS and which are potentially interesting. A key technical contribution of our work is the method for generation of first-order logic sentences which is able to drastically prune the space of sentences by discarding large fraction of sentences which would lead to redundant integer sequences.
List of keywords 
Knowledge Representation and Reasoning -> KRR: Other 4734
Explanation-Guided Reward Alignment
[+] More 
[-] Less 
Agents often need to infer a reward function from observations to learn desired behaviors. However, agents may infer a reward function that does not align with the original intent because there can be multiple reward functions consistent with its observations. Operating based on such misaligned rewards can be risky. Furthermore, black-box representations make it difficult to verify the learned rewards and prevent harmful behavior. We present a framework for verifying and improving reward alignment using explanations and show how explanations can help detect misalignment and reveal failure cases in novel scenarios. The problem is formulated as inverse reinforcement learning from ranked trajectories. Verification tests created from the trajectory dataset are used to iteratively validate and improve reward alignment. The agent explains its learned reward and a tester signals whether the explanation passes the test. In cases where the explanation fails, the agent offers alternative explanations to gather feedback, which is then used to improve the learned reward. We analyze the efficiency of our approach in improving reward alignment using different types of explanations and demonstrate its effectiveness in five domains.
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Reinforcement learning
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Safety and robustness AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Machine Learning -> ML: Reinforcement learning
4757
Context-Aware Feature Selection and Classification
[+] More 
[-] Less 
We propose a joint model that performs instance-level feature selection and classification. For a given case, the joint model first skims the full feature vector, decides which features are relevant for that case, and makes a classification decision using only the selected features, resulting in compact, interpretable, and case-specific classification decisions. Because the selected features depend on the case at hand, we refer to this approach as context-aware feature selection and classification. The model can be trained on instances that are annotated by experts with both class labels and instance-level feature selections, so it can select instance-level features that humans would use. Experiments on several datasets demonstrate that the proposed model outperforms eight baselines on a combined classification and feature selection measure, and is able to better emulate the ground-truth instance-level feature selections. The supplementary materials are available at https://github.com/IIT-ML/IJCAI23-CFSC.
Machine Learning -> ML: Classification
Machine Learning -> ML: Explainable/Interpretable machine learning
List of keywords 
Machine Learning -> ML: Feature extraction, selection and dimensionality reduction Machine Learning -> ML: Classification
Machine Learning -> ML: Explainable/Interpretable machine learning
4760
Diversity, Agreement, and Polarization in Elections
[+] More 
[-] Less 
We consider the notions of agreement, diversity, and polarization in ordinal elections (that is, in elections where voters rank the candidates). While (computational) social choice offers good measures of agreement between the voters, such measures for the other two notions are lacking. We attempt to rectify this issue by designing appropriate measures, providing means of their (approximate) computation, and arguing that they, indeed, capture diversity and polarization well. In particular, we present "maps of preference orders" that highlight relations between the votes in a given election and which help in making arguments about their nature.
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice 4765
Local and Global: Temporal Question Answering via Information Fusion
[+] More 
[-] Less 
Many models that leverage knowledge graphs (KGs) have recently demonstrated remarkable success in question answering (QA) tasks. In the real world, many facts contained in KGs are time-constrained thus temporal KGQA has received increasing attention. Despite the fruitful efforts of previous models in temporal KGQA, they still have several limitations. (I) They neither emphasize the graph structural information between entities in KGs nor explicitly utilize a multi-hop relation path through graph neural networks to enhance answer prediction. (II) They adopt pre-trained language models (LMs) to obtain question representations, focusing merely on the global information related to the question while not highlighting the local information of the entities in KGs. To address these limitations, we introduce a novel model that simultaneously explores both Local information and Global information for the task of temporal KGQA (LGQA). Specifically, we first introduce an auxiliary task in the temporal KG embedding procedure to make timestamp embeddings time-order aware. Then, we design information fusion layers that effectively incorporate local and global information to deepen question understanding. We conduct extensive experiments on two benchmarks, and LGQA significantly outperforms previous state-of-the-art models, especially in difficult questions. Moreover, LGQA can generate interpretable and trustworthy predictions.
List of keywords 
Natural Language Processing -> NLP: Question answering 4766
Toward Convex Manifolds: A Geometric Perspective for Deep Graph Clustering of Single-cell RNA-seq Data
[+] More 
[-] Less 
The deep clustering paradigm has shown great potential for discovering complex patterns that can reveal cell heterogeneity in single-cell RNA sequencing data. This paradigm involves two training phases: pretraining based on a pretext task and fine-tuning using pseudo-labels. Although current models yield promising results, they overlook the geometric distortions that regularly occur during the training process. More precisely, the transition between the two phases results in a coarse flattening of the latent structures, which can deteriorate the clustering performance. In this context, existing methods perform euclidean-based embedding clustering without ensuring the flatness and convexity of the latent manifolds. To address this problem, we incorporate two mechanisms. First, we introduce an overclustering loss to flatten the local curves. Second, we propose an adversarial mechanism to adjust the global geometric configuration. The second mechanism gradually transforms the latent structures into convex ones. Empirical results on a variety of gene expression datasets show that our model outperforms state-of-the-art methods.
Machine Learning -> ML: Clustering
Machine Learning -> ML: Unsupervised learning
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Bioinformatics Machine Learning -> ML: Clustering
Machine Learning -> ML: Unsupervised learning
4777
Vision Language Navigation with Knowledge-driven Environmental Dreamer
[+] More 
[-] Less 
Vision-language navigation (VLN) requires an agent to perceive visual observation in a house scene and navigate step-by-step following natural language instruction. Due to the high cost of data annotation and data collection, current VLN datasets provide limited instruction-trajectory data samples. Learning vision-language alignment for VLN from limited data is challenging since visual observation and language instruction are both complex and diverse. Previous works only generate augmented data based on original scenes while failing to generate data samples from unseen scenes, which limits the generalization ability of the navigation agent. In this paper, we introduce the Knowledge-driven Environmental Dreamer (KED), a method that leverages the knowledge of the embodied environment and generates unseen scenes for a navigation agent to learn. Generating an unseen environment with texture consistency and structure consistency is challenging. To address this problem, we incorporate three knowledge-driven regularization objectives into the KED and adopt a reweighting mechanism for self-adaptive optimization. Our KED method is able to generate unseen embodied environments without extra annotations. We use KED to successfully generate 270 houses and 500K instruction-trajectory pairs. The navigation agent with the KED method outperforms the state-of-the-art methods on various VLN benchmarks, such as R2R, R4R, and RxR. Both qualitative and quantitative experiments prove that our proposed KED method is able to high-quality augmentation data with texture consistency and structure consistency.
Robotics -> ROB: Robotics and vision
List of keywords 
Computer Vision -> CV: Vision and language  Robotics -> ROB: Robotics and vision
4783
Probabilistic Planning with Prioritized Preferences over Temporal Logic Objectives
[+] More 
[-] Less 
This paper studies temporal planning in probabilistic environments, modeled as labeled Markov decision processes (MDPs), with user preferences over multiple temporal goals.  Existing works reflect such preferences as a prioritized list of goals. This paper introduces a new specification language, termed prioritized qualitative choice linear temporal logic on finite traces, which augments linear temporal logic on finite traces with prioritized conjunction and ordered disjunction from prioritized qualitative choice logic. This language allows for succinctly specifying temporal objectives with corresponding preferences accomplishing each temporal task. The finite traces that describe the system’s behaviors are ranked based on their dissatisfaction scores with respect to the formula. We propose a systematic translation from the new language to a weighted deterministic finite automaton. Utilizing this computational model, we formulate and solve a problem of computing an optimal policy that minimizes the expected score of dissatisfaction given user preferences. We demonstrate the efficacy and applicability of the logic and the algorithm on several case studies with detailed analyses for each.
Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Planning and Scheduling -> PS: Markov decisions processes
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Formal verification, validation and synthesis Knowledge Representation and Reasoning -> KRR: Preference modelling and preference-based reasoning
Planning and Scheduling -> PS: Markov decisions processes
4787
Discriminative-Invariant Representation Learning for Unbiased Recommendation
[+] More 
[-] Less 
Selection bias hinders recommendation models from learning unbiased user preference. Recent works empirically reveal that pursuing invariant user and item representation across biased and unbiased data is crucial for counteracting selection bias. However, our theoretical analysis reveals that simply optimizing representation invariance is insufficient for addressing the selection bias — recommendation performance is bounded by both representation invariance and discriminability. Worse still, current invariant representation learning methods in recommendation neglect even hurt the representation discriminability due to data sparsity and label shift. In this light, we propose a new Discriminative-Invariant Representation Learning framework for unbiased recommendation, which incorporates label-conditional clustering and prior-guided contrasting into conventional invariant representation learning to mitigate the impact of data sparsity and label shift, respectively. We conduct extensive experiments on three real-world datasets, validating the rationality and effectiveness of the proposed framework. Code and supplementary materials are available at: https://github.com/HungPaan/DIRL.
AI Ethics, Trust, Fairness -> ETF: Bias
List of keywords 
Data Mining -> DM: Recommender systems AI Ethics, Trust, Fairness -> ETF: Bias
4799
Safety Verification and Universal Invariants for Relational Action Bases
[+] More 
[-] Less 
Modeling and verification of dynamic systems operating over a relational representation of states are increasingly investigated problems in AI, Business Process Management and Database Theory. To make these systems amenable to verification, the amount of information stored in each state needs to be bounded, or restrictions are imposed on the preconditions and effects of actions. We lift these restrictions by introducing the framework of Relational Action Bases (RABs), which generalizes existing frameworks and in which unbounded relational states are evolved through actions that can (1) quantify both existentially and universally over the data, and (2) use arithmetic constraints. We then study parameterized safety of RABs via (approximated) SMT-based backward search, singling out essential meta-properties of the resulting procedure, and showing how it can be realized by an off-the-shelf combination of existing verification modules of the state-of-the-art MCMT model checker. We demonstrate the effectiveness of this approach on a benchmark of data-aware business processes. Finally, we show how universal invariants can be exploited to make this procedure fully correct.
Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving
Knowledge Representation and Reasoning -> KRR: Causality
List of keywords 
Knowledge Representation and Reasoning -> KRR: Reasoning about actions Knowledge Representation and Reasoning -> KRR: Automated reasoning and theorem proving
Knowledge Representation and Reasoning -> KRR: Causality
4804
Incentive-Compatible Selection for One or Two Influentials
[+] More 
[-] Less 
Selecting influentials in networks against strategic manipulations has attracted many researchers’ attention and it also has many practical applications. Here, we aim to select one or two influentials in terms of progeny (the influential power) and prevent agents from manipulating their edges (incentive compatibility). The existing studies mostly focused on selecting a single influential for this setting. Zhang et al. [2021] studied the problem of selecting one agent and proved an upper bound of 1/(1+ln2) to approximate the optimal selection. In this paper, we first design a mechanism to actually reach the bound. Then, we move this forward to choosing two agents and propose a mechanism to achieve an approximation ratio of (3+ln2)/(4(1+ln2)) (approx. 0.54).
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Mechanism design 4826
Case-Based Reasoning with Language Models for Classification of Logical Fallacies
[+] More 
[-] Less 
The ease and speed of spreading misinformation and propaganda on the Web motivate the need to develop trustworthy technology for detecting fallacies in natural language arguments. However, state-of-the-art language modeling methods exhibit a lack of robustness on tasks like logical fallacy classification that require complex reasoning. In this paper, we propose a Case-Based Reasoning method that classifies new cases of logical fallacy by language-modeling-driven retrieval and adaptation of historical cases. We design four complementary strategies to enrich input representation for our model, based on external information about goals, explanations, counterarguments, and argument structure. Our experiments in in-domain and out-of-domain settings indicate that Case-Based Reasoning improves the accuracy and generalizability of language models. Our ablation studies suggest that representations of similar cases have a strong impact on the model performance, that models perform well with fewer retrieved cases, and that the size of the case database has a negligible effect on the performance. Finally, we dive deeper into the relationship between the properties of the retrieved cases and the model performance.
AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Knowledge Representation and Reasoning -> KRR: Case-based reasoning
List of keywords 
Natural Language Processing -> NLP: Information retrieval and text mining AI Ethics, Trust, Fairness -> ETF: Explainability and interpretability
Knowledge Representation and Reasoning -> KRR: Case-based reasoning
4842
A New ANN-SNN Conversion Method with High Accuracy, Low Latency and Good Robustness
[+] More 
[-] Less 
Due to the advantages of low energy consumption, high robustness and fast inference speed, Spiking Neural Networks (SNNs), with good biological interpretability and the potential to be applied on neuromorphic hardware, are regarded as the third generation of Artificial Neural Networks (ANNs). Despite having so many advantages, the biggest challenge encountered by spiking neural networks is training difficulty caused by the non-differentiability of spike signals. ANN-SNN conversion is an effective method that solves the training difficulty by converting parameters in ANNs to those in SNNs through a specific algorithm. However, the ANN-SNN conversion method also suffers from accuracy degradation and long inference time. In this paper, we reanalyzed the relationship between Integrate-and-Fire (IF) neuron model and ReLU activation function, proposed a StepReLU activation function more suitable for SNNs under membrane potential encoding, and used it to train ANNs. Then we converted the ANNs to SNNs with extremely small conversion error and introduced leakage mechanism to the SNNs and get the final models, which have high accuracy, low latency and good robustness, and have achieved the state-of-the-art performance on various datasets such as CIFAR and ImageNet.
Machine Learning -> ML: Classification
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Robustness
List of keywords 
Humans and AI -> HAI: Cognitive modeling Machine Learning -> ML: Classification
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Robustness
4853
HOUDINI: Escaping from Moderately Constrained Saddles
[+] More 
[-] Less 
We give polynomial time algorithms for escaping from high-dimensional saddle points under a moderate number of constraints. Given gradient access to a smooth function, we show that (noisy) gradient descent methods can escape from saddle points under a logarithmic number of inequality constraints. While analogous results exist for unconstrained and equality-constrained problems, we make progress on the major open question of convergence to second-order stationary points in the case of inequality constraints, without reliance on NP-oracles or altering the definitions to only account for certain constraints. Our results hold for both regular and stochastic gradient descent.
List of keywords 
Machine Learning -> ML: Optimization 4870
TITAN : Task-oriented Dialogues with Mixed-Initiative Interactions
[+] More 
[-] Less 
In multi-domain task-oriented dialogue systems, users proactively propose a series of domain-specific requests that can often be under-or over-specified, sometimes with ambiguous and cross-domain demands. System-sided initiative would be necessary to identify certain situations and appropriately interact with users to resolve them. However, most existing task-oriented dialogue systems fail to consider such mixed-initiative interaction strategies, performing low efficiency and poor collaboration ability in human-computer conversation. In this paper, we construct a multi-domain task-oriented dialogue dataset with mixed-initiative strategies named TITAN from the large-scale dialogue corpus MultiWOZ 2.1. It contains a total of 1,800 human-human conversations where the system can either ask clarification questions actively or provides relevant information to address failure situations and implicit user requests. We report the results of several baseline models on system response generation and dialogue act prediction to assess the performance of SOTA methods on TITAN. These models can capture mixed-initiative dialogue acts, while remaining the deficiency to actively generate implicit requests and accurately provide alternative information, suggesting ample room for improvement in future studies.
Natural Language Processing -> NLP: Resources and evaluation
Natural Language Processing -> NLP: Language generation
List of keywords 
Natural Language Processing -> NLP: Dialogue and interactive systems Natural Language Processing -> NLP: Resources and evaluation
Natural Language Processing -> NLP: Language generation
4929
Denoised Self-Augmented Learning for Social Recommendation
[+] More 
[-] Less 
Social recommendation is gaining increasing attention in various online applications, including e-commerce and online streaming, where social information is leveraged to improve user-item interaction modeling. Recently, Self-Supervised Learning (SSL) has proven to be remarkably effective in addressing data sparsity through augmented learning tasks. Inspired by this, researchers have attempted to incorporate SSL into social recommendation by supplementing the primary supervised task with social-aware self-supervised signals. However, social information can be unavoidably noisy in characterizing user preferences due to the ubiquitous presence of interest-irrelevant social connections, such as colleagues or classmates who do not share many common interests. To address this challenge, we propose a novel social recommender called the Denoised Self-Augmented Learning paradigm (DSL). Our model not only preserves helpful social relations to enhance user-item interaction modeling but also enables personalized cross-view knowledge transfer through adaptive semantic alignment in embedding space. Our experimental results on various recommendation benchmarks confirm the superiority of our DSL over state-of-the-art methods. We release our model implementation at: https://github.com/HKUDS/DSL.
Data Mining -> DM: Recommender systems
List of keywords 
Data Mining -> DM: Information retrieval Data Mining -> DM: Recommender systems
4930
Online Task Assignment with Controllable Processing Time
[+] More 
[-] Less 
We study a new online assignment problem, called the Online Task Assignment with Controllable Processing Time. In a bipartite graph,  a set of online vertices (tasks) should be assigned to a set of offline vertices (machines) under the known adversarial distribution (KAD) assumption. We are the first to study controllable processing time in this scenario: There are  multiple processing levels for each task and higher level brings larger utility but also larger processing delay.
A machine can reject an assignment at the cost of a rejection penalty, taken from a pre-determined rejection budget. Different processing levels cause different penalties. We propose the Online Machine and Level Assignment  (OMLA) Algorithm to simultaneously assign an offline machine and a processing level to each online task. We prove that OMLA achieves 1/2-competitive ratio if each machine has unlimited rejection budget and Δ/(3Δ-1)- competitive ratio if each machine has an initial rejection budget up to Δ. Interestingly, the competitive ratios do not change under different settings on the controllable processing time and we can conclude that OMLA is "insensitive" to the controllable processing time.
Planning and Scheduling -> PS: Planning under uncertainty
List of keywords 
Planning and Scheduling -> PS: Scheduling Planning and Scheduling -> PS: Planning under uncertainty
4936
Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks
[+] More 
[-] Less 
Exploration under sparse rewards is a key challenge for multi-agent reinforcement learning problems. Previous works argue that complex dynamics between agents and the huge exploration space in MARL scenarios amplify the vulnerability of classical count-based exploration methods when combined with agents parameterized by neural networks, resulting in inefficient exploration. In this paper, we show that introducing constrained joint policy diversity into a classical count-based method can significantly improve exploration when agents are parameterized by neural networks. Specifically, we propose a joint policy diversity to measure the difference between current joint policy and previous joint policies, and then use a filtering-based exploration constraint to further refine the joint policy diversity. Under the sparse-reward setting, we show that the proposed method significantly outperforms the state-of-the-art methods in the multiple-particle environment, the Google Research Football, and StarCraft II micromanagement tasks. To the best of our knowledge, on the hard 3s_vs_5z task which needs non-trivial strategies to defeat enemies, our method is the first to learn winning strategies without domain knowledge under the sparse-reward setting.
Machine Learning -> ML: Deep reinforcement learning
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning Machine Learning -> ML: Deep reinforcement learning
4961
Capturing the Long-Distance Dependency in the Control Flow Graph via Structural-Guided Attention for Bug Localization
[+] More 
[-] Less 
To alleviate the burden of software maintenance, bug localization, which aims to automatically locate the buggy source files based on the bug report, has drawn significant attention in the software mining community. Recent studies indicate that the program structure in source code carries more semantics reflecting the program behavior, which is beneficial for bug localization. Benefiting from the rich structural information in the Control Flow Graph (CFG), CFG-based bug localization methods have achieved the state-of-the-art performance. Existing CFG-based methods extract the semantic feature from the CFG via the graph neural network. However, the step-wise feature propagation in the graph neural network suffers from the problem of information loss when the propagation distance is long, while the long-distance dependency is rather common in the CFG. In this paper, we argue that the long-distance dependency is crucial for feature extraction from the CFG, and propose a novel bug localization model named sgAttention. In sgAttention, a particularly designed structural-guided attention is employed to globally capture the information in the CFG, where features of irrelevant nodes are masked for each node to facilitate better feature extraction from the CFG. Experimental results on four widely-used open-source software projects indicate that sgAttention averagely improves the state-of-the-art bug localization methods by 32.9\% and 29.2\% and the state-of-the-art pre-trained models by 5.8\%  and 4.9\% in terms of MAP and MRR, respectively.
List of keywords 
Data Mining -> DM: Mining codebase and software repositories 4969
Towards an Integrated View of Semantic Annotation for POIs with Spatial and Textual Information
[+] More 
[-] Less 
Categories of Point of Interest (POI) facilitate location-based services from many aspects like location search and POI recommendation. However, POI categories are often incomplete and new POIs are being consistently generated, this rises the demand for semantic annotation for POIs, i.e., labeling the POI with a semantic category. Previous methods usually model sequential check-in information of users to learn POI features for annotation. However, users’ check-ins are hardly obtained in reality, especially for those newly created POIs. In this context, we present a Spatial-Textual POI Annotation (STPA) model for static POIs, which derives POI categories using only the geographic locations and names of POIs. Specifically, we design a GCN-based spatial encoder to model spatial correlations among POIs to generate POI spatial embeddings, and an attention-based text encoder to model the semantic contexts of POIs to generate POI textual embeddings. We finally fuse the two embeddings and preserve multi-view correlations for semantic annotation. We conduct comprehensive experiments to validate the effectiveness of STPA with POI data from AMap. Experimental results demonstrate that STPA substantially outperforms several competitive baselines, which proves that STPA is a promising approach for annotating static POIs in map services.
List of keywords 
Data Mining -> DM: Mining spatial and/or temporal data 4973
Exploiting Non-Interactive Exercises in Cognitive Diagnosis
[+] More 
[-] Less 
Cognitive Diagnosis aims to quantify the proficiency level of students on specific knowledge concepts. Existing studies merely leverage observed historical students-exercise interaction logs to access proficiency levels. Despite effectiveness, observed interactions usually exhibit a power-law distribution, where the long tail consisting of students with few records lacks supervision signals. This phenomenon leads to inferior diagnosis among few records students. In this paper, we propose the Exercise-aware Informative Response Sampling (EIRS) framework to address the long-tail problem. EIRS is a general framework that explores the partial order between observed and unobserved responses as auxiliary ranking-based training signals to supplement cognitive diagnosis. Considering the abundance and complexity of unobserved responses, we first design an Exercise-aware Candidates Selection module, which helps our framework produce reliable potential responses for effective supplementary training. Then, we develop an Expected Ability Change-weighted Informative Sampling strategy to adaptively sample informative potential responses that contribute greatly to model training. Experiments on real-world datasets demonstrate the supremacy of our framework in long-tailed data.
Multidisciplinary Topics and Applications -> MDA: Education
List of keywords 
Data Mining -> DM: Applications Multidisciplinary Topics and Applications -> MDA: Education
4991
A Hierarchical Approach to Population Training for Human-AI Collaboration
[+] More 
[-] Less 
A major challenge for deep reinforcement learning (DRL) agents is to collaborate with novel partners that were not encountered by them during the training phase. This is specifically worsened by an increased variance in action responses when the DRL agents collaborate with human partners due to the lack of consistency in human behaviors. Recent work have shown that training a single agent as the best response to a diverse population of training partners significantly increases an agent’s robustness to novel partners. We further enhance the population-based training approach by introducing a Hierarchical Reinforcement Learning (HRL) based method for Human-AI Collaboration. Our agent is able to learn multiple best-response policies as its low-level policy while at the same time, it learns a high-level policy that acts as a manager which allows the agent to dynamically switch between the low-level best-response policies based on its current partner. We demonstrate that our method is able to dynamically adapt to novel partners of different play styles and skill levels in the 2-player collaborative Overcooked game environment. We also conducted a human study in the same environment to test the effectiveness of our method when partnering with real human subjects. Code is available at https://gitlab.com/marvl-hipt/hipt.
Machine Learning -> ML: Deep reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Human-agent interaction
List of keywords 
Humans and AI -> HAI: Human-AI collaboration Machine Learning -> ML: Deep reinforcement learning
Agent-based and Multi-agent Systems -> MAS: Human-agent interaction
5011
GPMO: Gradient Perturbation-Based Contrastive Learning for Molecule Optimization
[+] More 
[-] Less 
Optimizing molecules with desired properties is a crucial step in de novo drug design. While translation-based methods have achieved initial success, they continue to face the challenge of the “exposure bias” problem. The challenge of preventing the “exposure bias” problem of molecule
optimization lies in the need for both positive and negative molecules of contrastive learning. That is because generating positive molecules through data augmentation requires domain-specific knowledge, and randomly sampled negative molecules are easily distinguished from the real molecules. Hence, in this work, we propose a molecule optimization method called GPMO, which leverages a gradient perturbation-based contrastive learning method to prevent the “exposure bias” problem in translation-based molecule optimization. With the assistance of positive and negative molecules, GPMO is able to effectively handle both real and artificial molecules. GPMO is a molecule optimization method that is conditioned on matched molecule pairs for drug discovery. Our empirical studies show that GPMO outperforms the state-of-the- art molecule optimization methods. Furthermore,  the negative and positive perturbations improve the robustness of GPMO.
List of keywords 
Multidisciplinary Topics and Applications -> MDA: Bioinformatics 5012
Efficient Sign Language Translation with a Curriculum-based Non-autoregressive Decoder
[+] More 
[-] Less 
Most existing studies on Sign Language Translation (SLT) employ AutoRegressive  Decoding Mechanism (AR-DM) to generate target sentences. However, the main disadvantage of the AR-DM is high inference latency. To address this problem, we introduce Non-AutoRegressive Decoding Mechanism (NAR-DM) into SLT, which generates the whole sentence at once. Meanwhile, to improve its decoding ability, we integrate the advantages of curriculum learning and NAR-DM and propose a Curriculum-based NAR Decoder (CND). Specifically, the lower layers of the CND are expected to predict simple tokens that could be predicted correctly using source-side information solely. Meanwhile, the upper layers could predict complex tokens based on the lower layers’ predictions. Therefore, our CND significantly reduces the model’s inference latency while maintaining its competitive performance. Moreover, to further boost the performance of our CND, we propose a mutual learning framework, containing two decoders, i.e., an AR decoder and our CND. We jointly train the two decoders and minimize the KL divergence between their outputs, which enables our CND to learn the forward sequential knowledge from the strengthened AR decoder. Experimental results on PHOENIX2014T and CSL-Daily demonstrate that our model consistently outperforms all competitive baselines and achieves 7.92/8.02× speed-up compared to the AR SLT model respectively. Our source code is available at https://github.com/yp20000921/CND.
List of keywords 
Natural Language Processing -> NLP: Machine translation and multilinguality 5014
Adaptive Path-Memory Network for Temporal Knowledge Graph Reasoning
[+] More 
[-] Less 
Temporal knowledge graph (TKG) reasoning aims to predict the future missing facts based on historical information and has gained increasing research interest recently. Lots of works have been made to model the historical structural and temporal characteristics for the reasoning task. Most existing works model the graph structure mainly depending on entity representation. However, the magnitude of TKG entities in real-world scenarios is considerable, and an increasing number of new entities will arise as time goes on. Therefore, we propose a novel architecture modeling with relation feature of TKG, namely aDAptivE path-MemOry Network (DaeMon), which adaptively models the temporal path information between query subject and each object candidate across history time. It models the historical information without depending on entity representation. Specifically, DaeMon uses path memory to record the temporal path information derived from path aggregation unit across timeline considering the memory passing strategy between adjacent timestamps. Extensive experiments conducted on four real-world TKG datasets demonstrate that our proposed model obtains substantial performance improvement and outperforms the state-of-the-art up to 4.8% absolute in MRR.
Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Sequence and graph learning
List of keywords 
Data Mining -> DM: Knowledge graphs and knowledge base completion Knowledge Representation and Reasoning -> KRR: Learning and reasoning
Machine Learning -> ML: Sequence and graph learning
5048
Reconstruction-Aware Prior Distillation for Semi-supervised Point Cloud Completion
[+] More 
[-] Less 
Real-world sensors often produce incomplete, irregular, and noisy point clouds, making point cloud completion increasingly important. However, most existing completion methods rely on large paired datasets for training, which is labor-intensive. This paper proposes RaPD, a novel semi-supervised point cloud completion method that reduces the need for paired datasets. RaPD utilizes a two-stage training scheme, where a deep semantic prior is learned in stage 1 from unpaired complete and incomplete point clouds, and a semi-supervised prior distillation process is introduced in stage 2 to train a completion network using only a small number of paired samples. Additionally, a self-supervised completion module is introduced to improve performance using unpaired incomplete point clouds. Experiments on multiple datasets show that RaPD outperforms previous methods in both homologous and heterologous scenarios.
List of keywords 
Computer Vision -> CV: 3D computer vision 5051
Bidirectional Dilation Transformer for Multispectral and Hyperspectral Image Fusion
[+] More 
[-] Less 
Transformer-based methods have proven to be effective in achieving long-distance modeling, capturing the spatial and spectral information, and exhibiting strong inductive bias in various computer vision tasks. Generally, the Transformer model includes two common modes of multi-head self-attention (MSA): spatial MSA (Spa-MSA) and spectral MSA (Spe-MSA). However, Spa-MSA is computationally efficient but limits the global spatial response within a local window. On the other hand, Spe-MSA can calculate channel self-attention to accommodate high-resolution images, but it disregards the crucial local information that is essential for low-level vision tasks. In this study, we propose a bidirectional dilation Transformer (BDT) for multispectral and hyperspectral image fusion (MHIF), which aims to leverage the advantages of both MSA and the latent multiscale information specific to MHIF tasks. The BDT consists of two designed modules: the dilation Spa-MSA (D-Spa), which dynamically expands the spatial receptive field through a given hollow strategy, and the grouped Spe-MSA (G-Spe), which extracts latent features within the feature map and learns local data behavior. Additionally, to fully exploit the multiscale information from both inputs with different spatial resolutions, we employ a bidirectional hierarchy strategy in the BDT, resulting in improved performance. Finally, extensive experiments on two commonly used datasets, CAVE and Harvard, demonstrate the superiority of BDT both visually and quantitatively. Furthermore, the related code will be available at the GitHub page of the authors.
Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Multi-modal learning
List of keywords 
Machine Learning -> ML: Attention models Computer Vision -> CV: Machine learning for vision
Machine Learning -> ML: Multi-modal learning
5088
Revenue Maximization Mechanisms for an Uninformed Mediator with Communication Abilities
[+] More 
[-] Less 
Consider a market where a seller owns an item for sale and a buyer wants to purchase it. Each player has private information, known as their type. It can be costly and difficult for the players to reach an agreement through direct communication. However, with a mediator as a trusted third party, both players can communicate privately with the mediator without worrying about leaking too much or too little information. The mediator can design and commit to a multi-round communication protocol for both players, in which they update their beliefs about the other player’s type. The mediator cannot force the players to trade but can influence their behaviors by sending messages to them.
We study the problem of designing revenue-maximizing mechanisms for the mediator. We show that the mediator can, without loss of generality, focus on a set of direct and incentive-compatible mechanisms. We then formulate this problem as a mathematical program and provide an optimal solution in closed form under a regularity condition. Our mechanism is simple and has a threshold structure. We also discuss some interesting properties of the optimal mechanism, such as situations where the mediator may lose money.
Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Multidisciplinary Topics and Applications -> MDA: Economics
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Mechanism design Game Theory and Economic Paradigms -> GTEP: Auctions and market-based systems
Multidisciplinary Topics and Applications -> MDA: Economics
5090
Orientation-Independent Chinese Text Recognition in Scene Images
[+] More 
[-] Less 
Scene text recognition (STR) has attracted much attention due to its broad applications. The previous works pay more attention to dealing with the recognition of Latin text images with complex backgrounds by introducing language models or other auxiliary networks. Different from Latin texts, many vertical Chinese texts exist in natural scenes, which brings difficulties to current state-of-the-art STR methods. In this paper, we take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images, thus recognizing both horizontal and vertical texts robustly in natural scenes. Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information. We conduct experiments on a scene dataset for benchmarking Chinese text recognition, and the results demonstrate that the proposed method can indeed improve performance through disentangling content and orientation information. To further validate the effectiveness of our method, we additionally collect a Vertical Chinese Text Recognition (VCTR) dataset. The experimental results show that the proposed method achieves 45.63\% improvement on VCTR when introducing CIRN to the baseline model.
Computer Vision -> CV: Vision and language
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) Computer Vision -> CV: Vision and language
5092
FedHGN: A Federated Framework for Heterogeneous Graph Neural Networks
[+] More 
[-] Less 
Heterogeneous graph neural networks (HGNNs) can learn from typed and relational graph data more effectively than conventional GNNs. With larger parameter spaces, HGNNs may require more training data, which is often scarce in real-world applications due to privacy regulations (e.g., GDPR). Federated graph learning (FGL) enables multiple clients to train a GNN collaboratively without sharing their local data. However, existing FGL methods mainly focus on homogeneous GNNs or knowledge graph embeddings; few have considered heterogeneous graphs and HGNNs. In federated heterogeneous graph learning, clients may have private graph schemas. Conventional FL/FGL methods attempting to define a global HGNN model would violate schema privacy. To address these challenges, we propose FedHGN, a novel and general FGL framework for HGNNs. FedHGN adopts schema-weight decoupling to enable schema-agnostic knowledge sharing and employs coefficients alignment to stabilize the training process and improve HGNN performance. With better privacy preservation, FedHGN consistently outperforms local training and conventional FL methods on three widely adopted heterogeneous graph datasets with varying client numbers. The code is available at https://github.com/cynricfu/FedHGN.
Data Mining -> DM: Mining graphs
Machine Learning -> ML: Sequence and graph learning
List of keywords 
Machine Learning -> ML: Federated learning Data Mining -> DM: Mining graphs
Machine Learning -> ML: Sequence and graph learning
5096
Dual Relation Knowledge Distillation for Object Detection
[+] More 
[-] Less 
Knowledge distillation is an effective method for model compression. However, it is still a challenging topic to apply knowledge distillation to detection tasks. There are two key points resulting in poor distillation performance for detection tasks. One is the serious imbalance between foreground and background features, another one is that small object lacks enough feature representation. To solve the above issues, we propose a new distillation method named dual relation knowledge distillation (DRKD), including pixel-wise relation distillation and instance-wise relation distillation. The pixel-wise relation distillation embeds pixel-wise features in the graph space and applies graph convolution to capture the global pixel relation. By distilling the global pixel relation, the student detector can learn the relation between foreground and background features, and avoid the difficulty of distilling features directly for the feature imbalance issue. Besides, we find that instance-wise relation supplements valuable knowledge beyond independent features for small objects. Thus, the instance-wise relation distillation is designed, which calculates the similarity of different instances to obtain a relation matrix. More importantly, a relation filter module is designed to highlight valuable instance relations. The proposed dual relation knowledge distillation is general and can be easily applied for both one-stage and two-stage detectors. Our method achieves state-of-the-art performance, which improves Faster R-CNN based on ResNet50 from 38.4% to 41.6% mAP and improves RetinaNet based on ResNet50 from 37.4% to 40.3% mAP on COCO 2017.
List of keywords 
Computer Vision -> CV: Recognition (object detection, categorization) 5098
Formal Explanations of Neural Network Policies for Planning
[+] More 
[-] Less 
Deep learning is increasingly used to learn policies for planning problems, yet policies represented by neural networks are difficult to interpret, verify and trust. Existing formal approaches to post-hoc explanations provide concise reasons for a single decision made by an ML model. However, understanding planning policies require explaining sequences of decisions. In this paper,  we formulate the problem of finding explanations for the sequence of decisions recommended by a learnt policy in a given state. We show that, under certain assumptions, a minimal explanation for a sequence can be computed by solving a  number of single decision explanation problems which is linear in the length of the sequence. We present experimental results of our implementation of this approach for ASNet policies for classical planning domains.
Machine Learning -> ML: Explainable/Interpretable machine learning
Planning and Scheduling -> PS: Learning in planning and scheduling
List of keywords 
Planning and Scheduling -> PS: Model-based reasoning Machine Learning -> ML: Explainable/Interpretable machine learning
Planning and Scheduling -> PS: Learning in planning and scheduling
5102
KEST: Kernel Distance Based Efficient Self-Training for Improving Controllable Text Generation
[+] More 
[-] Less 
Self-training (ST) has come to fruition in language understanding tasks by producing pseudo labels, which reduces the labeling bottleneck of language model fine-tuning. Nevertheless, in facilitating semi-supervised controllable language generation, ST faces two key challenges. First, augmented by self-generated pseudo text, generation models tend to over-exploit the previously learned text distribution, suffering from mode collapse and poor generation diversity. Second, generating pseudo text in each iteration is time-consuming, severely decelerating the training process. In this work, we propose KEST, a novel and efficient self-training framework to handle these problems. KEST utilizes a kernel-based loss, rather than standard cross entropy, to learn from the soft pseudo text produced by a shared non-autoregressive generator. We demonstrate both theoretically and empirically that KEST can benefit from more diverse pseudo text in an efficient manner, which allows not only refining and exploiting the previously fitted distribution but also enhanced exploration towards a larger potential text space, providing a guarantee of improved performance. Experiments on three controllable generation tasks demonstrate that KEST significantly improves control accuracy while maintaining comparable text fluency and generation diversity against several strong baselines.
List of keywords 
Natural Language Processing -> NLP: Language generation 5107
Learning Prototype Classifiers for Long-Tailed Recognition
[+] More 
[-] Less 
The problem of long-tailed recognition (LTR) has received attention in recent years due to the fundamental power-law distribution of objects in the real-world. Most recent works in LTR use softmax classifiers that are biased in that they correlate classifier norm with the amount of training data for a given class. In this work, we show that learning prototype classifiers addresses the biased softmax problem in LTR. Prototype classifiers can deliver promising results simply using Nearest-Class-Mean (NCM), a special case where prototypes are empirical centroids. We go one step further and propose to jointly learn prototypes by using distances to prototypes in representation space as the logit scores for classification. Further, we theoretically analyze the properties of Euclidean distance based prototype classifiers that lead to stable gradient-based optimization which is robust to outliers. To enable independent distance scales along each channel, we enhance Prototype classifiers by learning channel-dependent temperature parameters. Our analysis shows that prototypes learned by Prototype classifiers are better separated than empirical centroids. Results on four LTR benchmarks show that Prototype classifier outperforms or is comparable to state-of-the-art methods. Our code is made available at https://github.com/saurabhsharma1993/prototype-classifier-ltr.
Machine Learning -> ML: Few-shot learning
List of keywords 
Computer Vision -> CV: Transfer, low-shot, semi- and un- supervised learning    Machine Learning -> ML: Few-shot learning
5126
Learning Attention from Attention:  Efficient Self-Refinement Transformer for Face Super-Resolution
[+] More 
[-] Less 
Recently, Transformer-based architecture has been introduced into face super-resolution task due to its advantage in capturing long-range dependencies. However, these approaches tend to integrate global information in a large searching region, which neglect to focus on the most relevant information and induce blurry effect by the irrelevant textures. Some improved methods simply constrain self-attention in a local window to suppress the useless information. But it also limits the capability of recovering high-frequency details when flat areas dominate the local searching window. To improve the above issues, we propose a novel self-refinement mechanism which could adaptively achieve texture-aware reconstruction in a coarse-to-fine procedure. Generally, the primary self-attention is first conducted to reconstruct the coarse-grained textures and detect the fine-grained regions required further compensation. Then, region selection attention is performed to refine the textures on these key regions. Since self-attention considers the channel information on tokens equally, we employ a dual-branch feature integration module to privilege the important channels in feature extraction. Furthermore, we design the wavelet fusion module which integrate shallow-layer structure and deep-layer detailed feature to recover realistic face images in frequency domain. Extensive experiments demonstrate the effectiveness on a variety of datasets.
Computer Vision -> CV: Computational photography
List of keywords 
Computer Vision -> CV: Biometrics, face, gesture and pose recognition Computer Vision -> CV: Computational photography
5145
FedET: A Communication-Efficient Federated Class-Incremental Learning Framework Based on Enhanced Transformer
[+] More 
[-] Less 
Federated Learning (FL) has been widely concerned for it enables decentralized learning while ensuring data privacy. However, most existing methods unrealistically assume that the classes encountered by local clients are fixed over time. After learning new classes, this impractical assumption will make the model’s catastrophic forgetting of old classes significantly severe. Moreover, due to the limitation of communication cost, it is challenging to use large-scale models in FL, which will affect the prediction accuracy. To address these challenges, we propose a novel framework, Federated Enhanced Transformer (FedET), which simultaneously achieves high accuracy and low communication cost. Specifically, FedET uses Enhancer, a tiny module, to absorb and communicate new knowledge, and applies pre-trained Transformers combined with different Enhancers to ensure high precision on various tasks. To address local forgetting caused by new classes of new tasks and global forgetting brought by non-i.i.d class imbalance across different local clients, we proposed an Enhancer distillation method to modify the imbalance between old and new knowledge and repair the non-i.i.d. problem. Experimental results demonstrate that FedET’s average accuracy on a representative benchmark dataset is 14.1% higher than the state-of-the-art method, while FedET saves 90% of the communication cost compared to the previous method.
Machine Learning -> ML: Incremental learning
List of keywords 
Machine Learning -> ML: Federated learning Machine Learning -> ML: Incremental learning
5148
Intent-aware Recommendation via Disentangled Graph Contrastive Learning
[+] More 
[-] Less 
Graph neural network (GNN) based recommender systems have become one of the mainstream trends due to the powerful learning ability from user behavior data. Understanding the user intents from behavior data is the key to recommender systems, which poses two basic requirements for GNN-based recommender systems. One is how to learn complex and diverse intents especially when the user behavior is usually inadequate in reality. The other is different behaviors have different intent distributions, so how to establish their relations for a more explainable  recommender system. In this paper, we present the Intent-aware Recommendation via Disentangled Graph Contrastive Learning (IDCL), which simultaneously learns interpretable intents and behavior distributions over those intents. Specifically, we first model the user behavior data as a user-item-concept graph, and design a GNN based behavior disentangling module to learn the different intents. Then we propose the intent-wise contrastive learning to enhance the intent disentangling and meanwhile infer the behavior distributions. Finally, the coding rate reduction regularization is introduced to make the behaviors of different intents orthogonal. Extensive experiments demonstrate the effectiveness of IDCL in terms of substantial improvement and the interpretability.
Data Mining -> DM: Networks
Data Mining -> DM: Recommender systems
List of keywords 
Data Mining -> DM: Mining graphs Data Mining -> DM: Networks
Data Mining -> DM: Recommender systems
5155
Prompt Federated Learning for Weather Forecasting: Toward Foundation Models on Meteorological Data
[+] More 
[-] Less 
To tackle the global climate challenge, it urgently needs to develop a collaborative platform for comprehensive weather forecasting on large-scale meteorological data. Despite urgency, heterogeneous meteorological sensors across countries and regions, inevitably causing multivariate heterogeneity and data exposure, become the main barrier. This paper develops a foundation model across regions capable of understanding complex meteorological data and providing weather forecasting. To relieve the data exposure concern across regions, a novel federated learning approach has been proposed to collaboratively learn a brand-new spatio-temporal Transformer-based foundation model across participants with heterogeneous meteorological data. Moreover, a novel prompt learning mechanism has been adopted to satisfy low-resourced sensors’ communication and computational constraints. The effectiveness of the proposed method has been demonstrated on classical weather forecasting tasks using three meteorological datasets with multivariate time series.
Machine Learning -> ML: Time series and data streams
List of keywords 
Machine Learning -> ML: Federated learning Machine Learning -> ML: Time series and data streams
5164
Learning to Binarize Continuous Features for Neuro-Rule Networks
[+] More 
[-] Less 
Neuro-Rule Networks (NRNs) emerge as a promising neuro-symbolic method, enjoyed by the ability to equate fully-connected neural networks with logic rules. To support learning logic rules consisting of boolean variables, converting input features into binary representations is required. Different from discrete features that could be directly transformed by one-hot encodings, continuous features need to be binarized based on some numerical intervals. Existing studies usually select the bound values of intervals based on empirical strategies (e.g., equal-width interval). However, it is not optimal since the bounds are fixed and cannot be optimized to accommodate the ultimate training target. In this paper, we propose AutoInt, an approach that automatically binarizes continuous features and enables the intervals to be optimized with NRNs in an end-to-end fashion. Specifically, AutoInt automatically selects an interval for a given continuous feature in a soft manner to enable a differentiable learning procedure of interval-related parameters. Moreover, it introduces an additional soft K-means clustering loss to make the interval centres approach the original feature value distribution, thus reducing the risk of overfitting intervals. We conduct comprehensive experiments on public datasets and demonstrate the effectiveness of AutoInt in boosting the performance of NRNs.
List of keywords 
Machine Learning -> ML: Neuro-symbolic methods 5168
A Fast Maximum k-Plex Algorithm Parameterized by the Degeneracy Gap
[+] More 
[-] Less 
Given a graph, the k-plex is a vertex set in which each vertex is not adjacent to at most k-1 other vertices in the set. The maximum k-plex problem, which asks for the largest k-plex from a given graph, is an important but computationally challenging problem in applications like graph search and community detection.  So far, there is a number of empirical algorithms  without sufficient theoretical explanations on the efficiency. We try to bridge this gap by defining a novel parameter of the input instance, g_k(G), the gap between the degeneracy bound and the size of maximum k-plex in the given graph, and presenting an exact algorithm parameterized by g_k(G). In other words, we design an algorithm with running time polynomial in the size of input graph and exponential in g_k(G) where k is a constant. Usually, g_k(G) is small and bounded by O(log(|V|)) in real-world graphs, indicating that the algorithm runs in polynomial time. We also carry out massive experiments and show that the algorithm is competitive with the state-of-the-art solvers. Additionally, for large k values such as 15 and 20, our algorithm has superior performance over existing algorithms.
Search -> S: Combinatorial search and optimisation
List of keywords 
Search -> S: Heuristic search Search -> S: Combinatorial search and optimisation
5171
MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning
[+] More 
[-] Less 
Recent approaches have utilized self-supervised auxiliary tasks as representation learning to improve the performance and sample efficiency of vision-based reinforcement learning algorithms in single-agent settings. However, in multi-agent reinforcement learning (MARL), these techniques face challenges because each agent only receives partial observation from an environment influenced by others, resulting in correlated observations in the agent dimension. So it is necessary to consider agent-level information in representation learning for MARL. In this paper, we propose an effective framework called Multi-Agent Masked Attentive Contrastive Learning (MA2CL), which encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space. Specifically, we use an attention reconstruction model for recovering and the model is trained via contrastive learning. MA2CL allows better utilization of contextual information at the agent level, facilitating the training of MARL agents for cooperation tasks. Extensive experiments demonstrate that our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Machine Learning -> ML: Representation learning
List of keywords 
Machine Learning -> ML: Deep reinforcement learning Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Machine Learning -> ML: Representation learning
5176
Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video
[+] More 
[-] Less 
Multimodal abstractive summarization for videos (MAS) requires generating a concise textual summary to describe the highlights of a video according to multimodal resources, in our case, the video content and its transcript. Inspired by the success of the large-scale generative pre-trained language model (GPLM) in generating high-quality textual content (e.g., summary), recent MAS methods have proposed to adapt the GPLM to this task by equipping it with the visual information, which is often obtained through a general-purpose visual feature extractor. However, the generally extracted visual features may overlook some summary-worthy visual information, which impedes model performance. In this work, we propose a novel approach to learning the summary-worthy visual representation that facilitates abstractive summarization. Our method exploits the summary-worthy information from both the cross-modal transcript data and the knowledge that distills from the pseudo summary. Extensive experiments on three public multimodal datasets show that our method outperforms all competing baselines. Furthermore, with the advantages of summary-worthy visual information, our model can have a significant improvement on small datasets or even datasets with limited training data.
Machine Learning -> ML: Multi-modal learning
List of keywords 
Natural Language Processing -> NLP: Summarization Machine Learning -> ML: Multi-modal learning
5195
More for Less: Safe Policy Improvement with Stronger Performance Guarantees
[+] More 
[-] Less 
In an offline reinforcement learning setting, the safe policy improvement (SPI) problem aims to improve the performance of a behavior policy according to which sample data has been generated.
State-of-the-art approaches to SPI require a high number of samples to provide practical probabilistic guarantees on the improved policy’s performance.
We present a novel approach to the SPI problem that provides the means to require less data for such guarantees. 
Specifically, to prove the correctness of these guarantees, we devise implicit transformations on the data set and the underlying environment model that serve as theoretical foundations to derive tighter improvement bounds for SPI.
Our empirical evaluation, using the well-established SPI with baseline bootstrapping (SPIBB) algorithm, on standard benchmarks shows that our method indeed significantly reduces the sample complexity of the SPIBB algorithm.
Planning and Scheduling -> PS: Learning in planning and scheduling
Planning and Scheduling -> PS: Planning under uncertainty
List of keywords 
Machine Learning -> ML: Reinforcement learning Planning and Scheduling -> PS: Learning in planning and scheduling
Planning and Scheduling -> PS: Planning under uncertainty
5220
Stochastic Population Update Can Provably Be Helpful in Multi-Objective Evolutionary Algorithms
[+] More 
[-] Less 
Evolutionary algorithms (EAs) have been widely and successfully applied to solve multi-objective optimization problems, due to their nature of population-based search. Population update is a key component in multi-objective EAs (MOEAs), and it is performed in a greedy, deterministic manner. That is, the next-generation population is formed by selecting the first population-size ranked solutions (based on some selection criteria, e.g., non-dominated sorting, crowdedness and indicators) from the collections of the current population and newly-generated solutions. In this paper, we question this practice. We analytically present that introducing randomness into the population update procedure in MOEAs can be beneficial for the search. More specifically, we prove that the expected running time of a well-established MOEA (SMS-EMOA) for solving a commonly studied bi-objective problem, OneJumpZeroJump, can be exponentially decreased if replacing its deterministic population update mechanism by a stochastic one. Empirical studies also verify the effectiveness of the proposed stochastic population update method. This work is an attempt to challenge a common practice for the population update in MOEAs. Its positive results, which might hold more generally, should encourage the exploration of developing new MOEAs in the area.
List of keywords 
Search -> S: Evolutionary computation 5225
Doubly Stochastic Graph-based Non-autoregressive Reaction Prediction
[+] More 
[-] Less 
Organic reaction prediction is a critical task in drug discovery. Recently, researchers have achieved non-autoregressive reaction prediction by modeling the redistribution of electrons, resulting in state-of-the-art top-1 accuracy, and enabling parallel sampling. However, the current non-autoregressive decoder does not satisfy two essential rules of electron redistribution modeling simultaneously: the electron-counting rule and the symmetry rule. This violation of the physical constraints of chemical reactions impairs model performance. In this work, we propose a new framework called ReactionSink that combines two doubly stochastic self-attention mappings to obtain electron redistribution predictions that follow both constraints. We further extend our solution to a general multi-head attention mechanism with augmented constraints. To achieve this, we apply Sinkhorn’s algorithm to iteratively update self-attention mappings, which imposes doubly conservative constraints as additional informative priors on electron redistribution modeling. We theoretically demonstrate that our ReactionSink can simultaneously satisfy both rules, which the current decoder mechanism cannot do. Empirical results show that our approach consistently improves the predictive performance of non-autoregressive models and does not bring an unbearable additional computational cost.
Machine Learning -> ML: Structured prediction
Machine Learning -> ML: Attention models
Multidisciplinary Topics and Applications -> MDA: Physical sciences
List of keywords 
Machine Learning -> ML: Applications Machine Learning -> ML: Structured prediction
Machine Learning -> ML: Attention models
Multidisciplinary Topics and Applications -> MDA: Physical sciences
5253
Runtime Analyses of Multi-Objective Evolutionary Algorithms in the Presence of Noise
[+] More 
[-] Less 
In single-objective optimization, it is well known that evolutionary algorithms also without further adjustments can stand a certain amount of noise in the evaluation of the objective function. In contrast, this question is not at all understood for multi-objective optimization.
 In this work, we conduct the first mathematical runtime analysis of a simple multi-objective evolutionary algorithm (MOEA) on a classic benchmark in the presence of noise in the objective function. 
 We prove that when bit-wise prior noise with rate p <= alpha/n, alpha a suitable constant, is present, the simple evolutionary multi-objective optimizer (SEMO) without any adjustments to cope with noise finds the Pareto front of the OneMinMax benchmark in time O(n^2 log n), just as in the case without noise. Given that the problem here is to arrive at a population consisting of n+1 individuals witnessing the Pareto front, this is a surprisingly strong robustness to noise (comparably simple evolutionary algorithms cannot optimize the single-objective OneMax problem in polynomial time when p = omega(log(n)/n)). Our proofs suggest that the strong robustness of the MOEA stems from its implicit diversity mechanism designed to enable it to compute a population covering the whole Pareto front. 
 
 Interestingly this result only holds when the objective value of a solution is determined only once and the algorithm from that point on works with this, possibly noisy, objective value. We prove that when all solutions are reevaluated in each iteration, then any noise rate p = omega(log(n)/n^2) leads to a super-polynomial runtime. This is very different from single-objective optimization, where it is generally preferred to reevaluate solutions whenever their fitness is important and where examples are known such that not reevaluating solutions can lead to catastrophic performance losses.
List of keywords 
Search -> S: Heuristic search 5260
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
[+] More 
[-] Less 
Audio-visual speech recognition (AVSR) research has gained a great success recently by improving the noise-robustness of audio-only automatic speech recognition (ASR) with noise-invariant visual information. However, most existing AVSR approaches simply fuse the audio and visual features by concatenation, without explicit interactions to capture the deep correlations between them, which results in sub-optimal multimodal representations for downstream speech recognition task. In this paper, we propose a cross-modal global interaction and local alignment (GILA) approach for AVSR, which captures the deep audio-visual (A-V) correlations from both global and local perspectives. Specifically, we design a global interaction model to capture the A-V complementary relationship on modality level, as well as a local alignment approach to model the A-V temporal consistency on frame level. Such a holistic view of cross-modal correlations enable better multimodal representations for AVSR. Experiments on public benchmarks LRS3 and LRS2 show that our GILA outperforms the supervised learning state-of-the-art. Code is at https://github.com/YUCHEN005/GILA.
List of keywords 
Natural Language Processing -> NLP: Speech 5274
Parameterized Local Search for Max c-Cut
[+] More 
[-] Less 
In the NP-hard Max c-Cut problem, one is given an undirected edge-weighted graph G and wants to color the vertices of G with c colors such that the total weight of edges with distinctly colored endpoints is maximal. The case with c=2 is the famous Max Cut problem. To deal with the NP-hardness of this problem, we study parameterized local search algorithms. More precisely, we study LS-Max c-Cut where we are additionally given a vertex coloring f and an integer k and the task is to find a better coloring f’ that differs from f in at most k entries, if such a coloring exists; otherwise, f is k-optimal. We show that LS-Max c-Cut presumably cannot be solved in g(k) · nᴼ⁽¹⁾ time even on bipartite graphs, for all c ≥ 2. We then show an algorithm for LS-Max c-Cut with running time O((3eΔ)ᵏ · c · k³ · Δ · n), where Δ is the maximum degree of the input graph. Finally, we evaluate the practical performance of this algorithm in a hill-climbing approach as a post-processing for state-of-the-art heuristics for Max c-Cut. We show that using parameterized local search, the results of this heuristic can be further improved on a set of standard benchmark instances.
List of keywords 
Search -> S: Local search 5281
Diagnose Like a Pathologist: Transformer-Enabled Hierarchical Attention-Guided Multiple Instance Learning for Whole Slide Image Classification
[+] More 
[-] Less 
Multiple Instance Learning (MIL) and transformers are increasingly popular in histopathology Whole Slide Image (WSI) classification. However, unlike human pathologists who selectively observe specific regions of histopathology tissues under different magnifications, most methods do not incorporate multiple resolutions of the WSIs, hierarchically and attentively, thereby leading to a loss of focus on the WSIs and information from other resolutions. To resolve this issue, we propose a Hierarchical Attention-Guided Multiple Instance Learning framework to fully exploit the WSIs. This framework can dynamically and attentively discover the discriminative regions across multiple resolutions of the WSIs. Within this framework, an Integrated Attention Transformer is proposed to further enhance the performance of the transformer and obtain a more holistic WSI (bag) representation. This transformer consists of multiple Integrated Attention Modules, which is the combination of a transformer layer and an aggregation module that produces a bag representation based on every instance representation in that bag. The experimental results show that our method achieved state-of-the-art performances on multiple datasets, including Camelyon16, TCGA-RCC, TCGA-NSCLC, and an in-house IMGC dataset. The code is available at https://github.com/BearCleverProud/HAG-MIL.
Machine Learning -> ML: Attention models
Machine Learning -> ML: Weakly supervised learning
List of keywords 
Computer Vision -> CV: Biomedical image analysis Machine Learning -> ML: Attention models
Machine Learning -> ML: Weakly supervised learning
5292
Optimal Decision Tree Policies for Markov Decision Processes
[+] More 
[-] Less 
Interpretability of reinforcement learning policies is essential for many real-world tasks but learning such interpretable policies is a hard problem. Particularly, rule-based policies such as decision trees and rules lists are difficult to optimize due to their non-differentiability. While existing techniques can learn verifiable decision tree policies, there is no guarantee that the learners generate a policy that performs optimally. In this work, we study the optimization of size-limited decision trees for Markov Decision Processes (MPDs) and propose OMDTs: Optimal MDP Decision Trees. Given a user-defined size limit and MDP formulation, OMDT directly maximizes the expected discounted return for the decision tree using Mixed-Integer Linear Programming. By training optimal tree policies for different MDPs we empirically study the optimality gap for existing imitation learning techniques and find that they perform sub-optimally. We show that this is due to an inherent shortcoming of imitation learning, namely that complex policies cannot be represented using size-limited trees. In such cases, it is better to directly optimize the tree for expected return. While there is generally a trade-off between the performance and interpretability of machine learning models, we find that on small MDPs, depth 3 OMDTs often perform close to optimally.
Machine Learning -> ML: Explainable/Interpretable machine learning
Search -> S: Combinatorial search and optimisation
List of keywords 
Planning and Scheduling -> PS: Markov decisions processes Machine Learning -> ML: Explainable/Interpretable machine learning
Search -> S: Combinatorial search and optimisation
5293
Proportionality Guarantees in Elections with Interdependent Issues
[+] More 
[-] Less 
We consider a multi-issue election setting over a set of possibly interdependent issues with the goal of achieving proportional representation of the views of the electorate. To this end, we employ a proportionality criterion suggested recently in the literature, that guarantees fair representation for all groups of voters of sufficient size. For this criterion, there exist rules that perform well in the case where all the issues have a binary domain and are independent of each other. In particular, this has been shown for Proportional Approval Voting (PAV) and for the Method of Equal Shares (MES). In this paper, we go two steps further: we generalize these guarantees for issues with a non-binary domain, and, most importantly, we consider extensions to elections with dependencies among issues, where we identify restrictions that lead to analogous results. To achieve this, we define appropriate generalizations of PAV and MES to handle conditional ballots. In addition to proportionality considerations, we also examine the computational properties of the conditional version of MES. Our findings indicate that the conditional case poses additional challenges and differs significantly from the unconditional one, both in terms of proportionality guarantees and computational complexity.
List of keywords 
Game Theory and Economic Paradigms -> GTEP: Computational social choice 5308
Can I Really Do That? Verification of Meta-Operators via Stackelberg Planning
[+] More 
[-] Less 
Macro-operators are a common reformulation method in planning that adds high-level operators corresponding to a fixed sequence of primitive operators. We introduce meta-operators, which allow using different sequences of actions in each state. We show how to automatically verify whether a meta-operator is valid, i.e., the represented behavior is always doable. This can be checked at once for all instantiations of the meta-operator and all reachable states via a compilation into Stackelberg planning, a form of adversarial planning.  Our results show that meta-operators learned for multiple domains can often express useful high-level behaviors very compactly, improving planners’ performance.
Planning and Scheduling -> PS: Learning in planning and scheduling
Planning and Scheduling -> PS: Search in planning and scheduling
List of keywords 
Planning and Scheduling -> PS: Planning algorithms Planning and Scheduling -> PS: Learning in planning and scheduling
Planning and Scheduling -> PS: Search in planning and scheduling
5311
The First Proven Performance Guarantees for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) on a Combinatorial Optimization Problem
[+] More 
[-] Less 
The Non-dominated Sorting Genetic Algorithm-II (NSGA-II) is one of the most prominent algorithms to solve multi-objective optimization problems. Recently, the first mathematical runtime guarantees have been obtained for this algorithm, however only for synthetic benchmark problems. 
In this work, we give the first proven performance guarantees for a classic optimization problem, the NP-complete bi-objective minimum spanning tree problem. More specifically, we show that the NSGA-II with population size N >= 4((n-1) wmax + 1) computes all extremal points of the Pareto front in an expected number of O(m^2 n wmax log(n wmax)) iterations, where n is the number of vertices, m the number of edges, and wmax is the maximum edge weight in the problem instance. This result confirms, via mathematical means, the good performance of the NSGA-II observed empirically. It also shows that mathematical analyses of this algorithm are not only possible for synthetic benchmark problems, but also for more complex combinatorial optimization problems. 
  
  As a side result, we also obtain a new analysis of the performance of the  global SEMO algorithm on the bi-objective minimum spanning tree problem, which improves the previous best result by a factor of |F|, the number of extremal points of the Pareto front, a set that can be as large as n wmax. The main reason for this improvement is our observation that both multi-objective evolutionary algorithms find the different extremal points in parallel rather than sequentially, as assumed in the previous proofs.
Search -> S: Heuristic search
List of keywords 
Search -> S: Evolutionary computation Search -> S: Heuristic search
5323
Sampling Ex-Post Group-Fair Rankings
[+] More 
[-] Less 
Randomized rankings have been of recent interest to achieve ex-ante fairer exposure and better robustness than deterministic rankings. We propose a set of natural axioms for randomized group-fair rankings and prove that there exists a unique distribution D that satisfies our axioms and is supported only over ex-post group-fair rankings, i.e., rankings that satisfy given lower and upper bounds on group-wise representation in the top-k ranks. Our problem formulation works even when there is implicit bias, incomplete relevance information, or only ordinal ranking is available instead of relevance scores or utility values. 
We propose two algorithms to sample a random group-fair ranking from the distribution D mentioned above. Our first dynamic programming-based algorithm samples ex-post group-fair rankings uniformly at random in time O(k^2 ell), where "ell" is the number of groups. Our second random walk-based algorithm samples ex-post group-fair rankings from a distribution epsilon-close to D in total variation distance and has expected running time O*(k^2 ell^2), when there is a sufficient gap between the given upper and lower bounds on the group-wise representation. The former does exact sampling, but the latter runs significantly faster on real-world data sets for larger values of k. We give empirical evidence that our algorithms compare favorably against recent baselines for fairness and ranking utility on real-world data sets.
AI Ethics, Trust, Fairness -> ETF: Bias
Search -> S: Combinatorial search and optimisation
List of keywords 
AI Ethics, Trust, Fairness -> ETF: Fairness and diversity AI Ethics, Trust, Fairness -> ETF: Bias
Search -> S: Combinatorial search and optimisation
5367
FedBFPT: An Efficient Federated Learning Framework for Bert Further Pre-training
[+] More 
[-] Less 
This study proposes FEDBFPT (Federated BERT Further Pre-Training), a Federated Learning (FL) framework for further pre-training the BERT language model in specialized domains while addressing privacy concerns. FEDBFPT enables multiple clients to collaboratively train the shallower layers of BERT, which are crucial in the pre-training stage, without the need to share private data. To achieve this, FEDBFPT involves building a local model for each client, progressively training the shallower layers of local models while sampling deeper layers, and aggregating trained parameters on a server to create the final global model. This approach utilizes multiple smaller local models to further pre-train a global model targeted at specific tasks via fine-tuning, resulting in a reduction in resource usage while maintaining model accuracy. Theoretical analysis is conducted to support the efficiency of FEDBFPT, and experiments are conducted on corpora across domains such as medicine, biology, and computer science. Results indicate that FEDBFPT achieves performance levels comparable to traditional FL methods while reducing computation and communication costs by 46.70% and 7.04%, respectively, even approaching the performance of centralized training models. The Source code is released at https://github.com/Hanzhouu/FedBFPT.
Natural Language Processing -> NLP: Applications
List of keywords 
Machine Learning -> ML: Federated learning Natural Language Processing -> NLP: Applications
3151
Learning Dissemination Strategies for External Sources in Opinion Dynamic Models with Cognitive Biases
[+] More 
[-] Less 
The opinions of members of a population are influenced by opinions of their peers, their own predispositions, and information from external sources via one or more information channels (e.g., news, social media). Due to individual cognitive biases, the perceptual impact of and importance assigned by agents to information on each channel can be different. In this paper, we propose a model of opinion evolution that uses prospect theory to represent perception of information from the external source along each channel. Our prospect-theoretic model reflects traits observed in humans such as loss aversion, assigning inflated (deflated) values to low (high) probability events, and evaluating outcomes relative to an individually known reference point. We consider the problem of determining information dissemination strategies for the external source to adopt in order to drive opinions of individuals towards a desired value. However, computing a strategy faces a challenge that agents’ initial predispositions and functions characterizing their perceptions of information disseminated might be unknown. We overcome this challenge by using Gaussian process learning to estimate these unknown parameters. When the external source sends information over multiple channels, the problem of jointly selecting optimal dissemination strategies is in general, combinatorial. We prove that this problem is submodular, and design near-optimal dissemination algorithms. We evaluate our model on three different widely used large graphs that represent real-world social interactions. Our results indicate that the external source can effectively drive opinions towards a desired value when using prospect-theory based dissemination strategies.
Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Other
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Agent theories and models Agent-based and Multi-agent Systems -> MAS: Coordination and cooperation
Agent-based and Multi-agent Systems -> MAS: Other
234
Learning to Send Reinforcements: Coordinating Multi-Agent Dynamic Police Patrol Dispatching and Rescheduling via Reinforcement Learning
[+] More 
[-] Less 
We address the problem of coordinating multiple agents in a dynamic police patrol scheduling via a Reinforcement Learning (RL) approach. Our approach utilizes Multi-Agent Value Function Approximation (MAVFA) with a rescheduling heuristic to learn dispatching and rescheduling policies jointly. Often, police operations are divided into multiple sectors for more effective and efficient operations. In a dynamic setting, incidents occur throughout the day across different sectors, disrupting initially-planned patrol schedules. To maximize policing effectiveness, police agents from different sectors cooperate by sending reinforcements to support one another in their incident response and even routine patrol. This poses an interesting research challenge on how to make such complex decision of dispatching and rescheduling involving multiple agents in a coordinated fashion within an operationally reasonable time. Unlike existing Multi-Agent RL (MARL) approaches which solve similar problems by either decomposing the problem or action into multiple components, our approach learns the dispatching and rescheduling policies jointly without any decomposition step. In addition, instead of directly searching over the joint action space, we incorporate an iterative best response procedure as a decentralized optimization heuristic and an explicit coordination mechanism for a scalable and coordinated decision-making. We evaluate our approach against the commonly adopted two-stage approach and conduct a series of ablation studies to ascertain the effectiveness of our proposed learning and coordination mechanisms.
Planning and Scheduling -> PS: Learning in planning and scheduling
Planning and Scheduling -> PS: Applications
List of keywords 
Agent-based and Multi-agent Systems -> MAS: Multi-agent learning Planning and Scheduling -> PS: Learning in planning and scheduling
Planning and Scheduling -> PS: Applications
4287
Learning When to Advise Human Decision Makers
[+] More 
[-] Less 
Artificial intelligence (AI) systems are increasingly used for providing advice to facilitate human decision making in a wide range of domains, such as healthcare, criminal justice, and finance. Motivated by limitations of the current practice where algorithmic advice is provided to human users as a constant element in the decision-making pipeline, in this paper we raise the question of when should algorithms provide advice? We propose a novel design of AI systems in which the algorithm interacts with the human user in a two-sided manner and aims to provide advice only when it is likely to be beneficial for the user in making their decision. The results of a large-scale experiment show that our advising approach manages to provide advice at times of need and to significantly improve human decision making compared to fixed, non-interactive, advising approaches. This approach has additional advantages in facilitating human learning, preserving complementary strengths of human decision makers, and leading to more positive responsiveness to the advice.
Humans and AI -> HAI: Applications
List of keywords 
Humans and AI -> HAI: Human-AI collaboration Humans and AI -> HAI: Applications
141
Causal-Based Supervision of Attention in Graph Neural Network: A Better and Simpler Choice towards Powerful Attention
[+] More 
[-] Less 
Recent years have witnessed the great potential of attention mechanism in graph representation learning. However, while variants of attention-based GNNs are setting new benchmarks for numerous real-world datasets, recent works have pointed out that their induced attentions are less robust and generalizable against noisy graphs due to lack of direct supervision. In this paper, we present a new framework which utilizes the tool of causality to provide a powerful supervision signal for the learning process of attention functions. Specifically, we estimate the direct causal effect of attention to the final prediction, and then maximize such effect to guide attention attending to more meaningful neighbors. Our method can serve as a plug-and-play module for any canonical attention-based GNNs in an end-to-end fashion. Extensive experiments on a wide range of benchmark datasets illustrated that, by directly supervising attention functions, the model is able to converge faster with a clearer decision boundary, and thus yields better performances.
List of keywords 
Data Mining -> DM: Mining graphs 