Tuesday 22nd August
Tuesday 22nd August
11:45-12:45
Machine Learning (1/12)
#1340
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
With the development of Vision-Language Pre-training Models (VLPMs) represented by CLIP and ALIGN, significant breakthroughs have been achieved for association-based visual tasks such as image classification and image-text retrieval by the zero-shot capability of CLIP without fine-tuning. However, CLIP is hard to apply to generation-based tasks. This is due to the lack of decoder architecture and pre-training tasks for generation. Although previous works have created generation capacity for CLIP through additional language models, a modality gap between the CLIP representations of different modalities and the inability of CLIP to model the offset of this gap, which results in the failure of the concept to transfer across modes. To solve the problem, we try to map images/videos to the language modality and generate captions from the language modality. In this paper, we propose the K-nearest-neighbor Cross-modality Mapping (Knight), a zero-shot method from association to generation. With vision-free unsupervised training, Knight achieves state-of-the-art performance in zero-shot methods for image captioning and video captioning.
#SV5630
Graph-based Molecular Representation Learning
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science. In particular, it encodes molecules as numerical vectors preserving the molecular structures and features, on top of which the downstream tasks (e.g., property prediction) can be performed. Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning. In this survey, we systematically review these graph-based molecular representation techniques, especially the methods incorporating chemical domain knowledge. Specifically, we first introduce the features of 2D and 3D molecular graphs. Then we summarize and categorize MRL methods into three groups based on their input. Furthermore, we discuss some typical chemical applications supported by MRL. To facilitate studies in this fast-developing area, we also list the benchmarks and commonly used datasets in the paper. Finally, we share our thoughts on future research directions.
#292
Recognizable Information Bottleneck
Information Bottlenecks (IBs) learn representations that generalize to unseen data by information compression. However, existing IBs are practically unable to guarantee generalization in real-world scenarios due to the vacuous generalization bound. The recent PAC-Bayes IB uses information complexity instead of information compression to establish a connection with the mutual information generalization bound. However, it requires the computation of expensive second-order curvature, which hinders its practical application. In this paper, we establish the connection between the recognizability of representations and the recent functional conditional mutual information (f-CMI) generalization bound, which is significantly easier to estimate. On this basis we propose a Recognizable Information Bottleneck (RIB) which regularizes the recognizability of representations through a recognizability critic optimized by density ratio matching under the Bregman divergence. Extensive experiments on several commonly used datasets demonstrate the effectiveness of the proposed method in regularizing the model and estimating the generalization gap.
#2927
ReLiNet: Stable and Explainable Multistep Prediction with Recurrent Linear Parameter Varying Networks
Multistep prediction models are essential for the simulation and model-predictive control of dynamical systems. Verifying the safety of such models is a multi-faceted problem requiring both system-theoretic guarantees as well as establishing trust with human users. In this work, we propose a novel approach, ReLiNet (Recurrent Linear Parameter Varying Network), to ensure safety for multistep prediction of dynamical systems. Our approach simplifies a recurrent neural network to a switched linear system that is constrained to guarantee exponential stability, which acts as a surrogate for safety from a system-theoretic perspective. Furthermore, ReLiNet’s computation can be reduced to a single linear model for each time step, resulting in predictions that are explainable by definition, thereby establishing trust from a human-centric perspective. Our quantitative experiments show that ReLiNet achieves prediction accuracy comparable to that of state-of-the-art recurrent neural networks, while achieving more faithful and robust explanations compared to the model-agnostic explanation method of LIME.
#705
Some Might Say All You Need Is Sum
The expressivity of Graph Neural Networks (GNNs) is dependent on the aggregation functions they employ. Theoretical works have pointed towards Sum aggregation GNNs subsuming every other GNNs, while certain practical works have observed a clear advantage to using Mean and Max. An examination of the theoretical guarantee identifies two caveats. First, it is size-restricted, that is, the power of every specific GNN is limited to graphs of a specific size. Successfully processing larger graphs may require an other GNN, and so on. Second, it concerns the power to distinguish non-isomorphic graphs, not the power to approximate general functions on graphs, and the former does not necessarily imply the latter.
It is desired that a GNN’s usability will not be limited to graphs of any specific size. Therefore, we explore the realm of unrestricted-size expressivity. We prove that basic functions, which can be computed exactly by Mean or Max GNNs, are inapproximable by any Sum GNN. We prove that under certain restrictions, every Mean or Max GNN can be approximated by a Sum GNN, but even there, a combination of (Sum, [Mean/Max]) is more expressive than Sum alone. Lastly, we prove further expressivity limitations for GNNs with a broad class of aggregations.
#382
Learning to Learn from Corrupted Data for Few-Shot Learning
Few-shot learning which aims to generalize knowledge learned from annotated base training data to recognize unseen novel classes has attracted considerable attention. Existing few-shot methods rely on completely clean training data. However, in the real world, the training data are always corrupted and accompanied by noise due to the disturbance in data transmission and low-quality annotation, which severely degrades the performance and generalization capability of few-shot models. To address the problem, we propose a unified peer-collaboration learning (PCL) framework to extract valid knowledge from corrupted data for few-shot learning. PCL leverages two modules to mimic the peer collaboration process which cooperatively evaluates the importance of each sample. Specifically, each module first estimates the importance weights of different samples by encoding the information provided by the other module from both global and local perspectives. Then, both modules leverage the obtained importance weights to guide the reevaluation of the loss value of each sample. In this way, the peers can mutually absorb knowledge to improve the robustness of few-shot models. Experiments verify that our framework combined with different few-shot methods can significantly improve the performance and robustness of original models.
Tuesday 22nd August
11:45-12:45
ML: Federated Learning (1/3)
#704
FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning
Large-scale neural networks possess considerable expressive power. They are well-suited for complex learning tasks in industrial applications. However, large-scale models pose significant challenges for training under the current Federated Learning (FL) paradigm. Existing approaches for efficient FL training often leverage model parameter dropout. However, manipulating individual model parameters is not only inefficient in meaningfully reducing the communication overhead when training large-scale FL models, but may also be detrimental to the scaling efforts and model performance as shown by recent research. To address these issues, we propose the Federated Opportunistic Block Dropout (FedOBD) approach. The key novelty is that it decomposes large-scale models into semantic blocks so that FL participants can opportunistically upload quantized blocks, which are deemed to be significant towards training the model, to the FL server for aggregation. Extensive experiments evaluating FedOBD against four state-of-the-art approaches based on multiple real-world datasets show that it reduces the overall communication overhead by more than 88% compared to the best performing baseline approach, while achieving the highest test accuracy. To the best of our knowledge, FedOBD is the first approach to perform dropout on FL models at the block level rather than at the individual parameter level.
#203
FedSampling: A Better Sampling Strategy for Federated Learning
Federated learning (FL) is an important technique for learning models from decentralized data in a privacy-preserving way. Existing FL methods usually uniformly sample clients for local model learning in each round. However, different clients may have significantly different data sizes, and the clients with more data cannot have more opportunities to contribute to model training, which may lead to inferior performance. In this paper, instead of client uniform sampling, we propose a novel data uniform sampling strategy for federated learning (FedSampling), which can effectively improve the performance of federated learning especially when client data size distribution is highly imbalanced across clients. In each federated learning round, local data on each client is randomly sampled for local model learning according to a probability based on the server desired sample size and the total sample size on all available clients. Since the data size on each client is privacy-sensitive, we propose a privacy-preserving way to estimate the total sample size with a differential privacy guarantee. Experiments on four benchmark datasets show that FedSampling can effectively improve the performance of federated learning.
#3414
Dual Personalization on Federated Recommendation
Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of recommender systems in federated settings. The code is available.
#SV5619
A Survey of Federated Evaluation in Federated Learning
In traditional machine learning, it is trivial to conduct model evaluation since all data samples are managed centrally by a server. However, model evaluation becomes a challenging problem in federated learning (FL), which is called federated evaluation in this work. This is because clients do not expose their original data to preserve data privacy. Federated evaluation plays a vital role in client selection, incentive mechanism design, malicious attack detection, etc. In this paper, we provide the first comprehensive survey of existing federated evaluation methods. Moreover, we explore various applications of federated evaluation for enhancing FL performance and finally present future research directions by envisioning some challenges.
#2670
FedPass: Privacy-Preserving Vertical Federated Deep Learning with Adaptive Obfuscation
Vertical federated learning (VFL) allows an active party with labeled data to leverage auxiliary features from the passive parties to improve model performance. Concerns about the private feature and label leakage in both the training and inference phases of VFL have drawn wide research attention. In this paper, we propose a general privacy-preserving vertical federated deep learning framework called FedPass, which leverages adaptive obfuscation to protect the feature and label simultaneously. Strong privacy-preserving capabilities about private features and labels are theoretically proved (in Theorems 1 and 2).
Extensive experimental results with different datasets and network architectures also justify the superiority of FedPass against existing methods in light of its near-optimal trade-off between privacy and model performance.
#5092
FedHGN: A Federated Framework for Heterogeneous Graph Neural Networks
Heterogeneous graph neural networks (HGNNs) can learn from typed and relational graph data more effectively than conventional GNNs. With larger parameter spaces, HGNNs may require more training data, which is often scarce in real-world applications due to privacy regulations (e.g., GDPR). Federated graph learning (FGL) enables multiple clients to train a GNN collaboratively without sharing their local data. However, existing FGL methods mainly focus on homogeneous GNNs or knowledge graph embeddings; few have considered heterogeneous graphs and HGNNs. In federated heterogeneous graph learning, clients may have private graph schemas. Conventional FL/FGL methods attempting to define a global HGNN model would violate schema privacy. To address these challenges, we propose FedHGN, a novel and general FGL framework for HGNNs. FedHGN adopts schema-weight decoupling to enable schema-agnostic knowledge sharing and employs coefficients alignment to stabilize the training process and improve HGNN performance. With better privacy preservation, FedHGN consistently outperforms local training and conventional FL methods on three widely adopted heterogeneous graph datasets with varying client numbers. The code is available at https://github.com/cynricfu/FedHGN.
Tuesday 22nd August
11:45-12:45
ML: Explainable/Interpretable Machine Learning
#SV5473
Even If Explanations: Prior Work, Desiderata & Benchmarks for Semi-Factual XAI
Recently, eXplainable AI (XAI) research has focused on counterfactual explanations as post-hoc justifications for AI-system decisions (e.g., a customer refused a loan might be told “if you asked for a loan with a shorter term, it would have been approved”). Counterfactuals explain what changes to the input-features of an AI system change the output-decision. However, there is a sub-type of counterfactual, semi-factuals, that have received less attention in AI (though the Cognitive Sciences have studied them more). This paper surveys semi-factual explanation, summarising historical and recent work. It defines key desiderata for semi-factual XAI, reporting benchmark tests of historical algorithms (as well as a novel, naïve method) to provide a solid basis for future developments.
#SV5501
Benchmarking eXplainable AI – A Survey on Available Toolkits and Open Challenges
The goal of Explainable AI (XAI) is to make the reasoning of a machine learning model accessible to humans, such that users of an AI system can evaluate and judge the underlying model. Due to the blackbox nature of XAI methods it is, however, hard to disentangle the contribution of a model and the explanation method to the final output. It might be unclear on whether an unexpected output is caused by the model or the explanation method. Explanation models, therefore, need to be evaluated in technical (e.g. fidelity to the model) and user-facing (correspondence to domain knowledge) terms. A recent survey has identified 29 different automated approaches to quantitatively evaluate explanations. In this work, we take an additional perspective and analyse which toolkits and data sets are available. We investigate which evaluation metrics are implemented in the toolkits and whether they produce the same results. We find that only a few aspects of explanation quality are currently covered, data sets are rare and evaluation results are not comparable across different toolkits. Our survey can serve as a guide for the XAI community for identifying future directions of research, and most notably, standardisation of evaluation.
#2103
Explainable Reinforcement Learning via a Causal World Model
Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.
#2250
DeLELSTM: Decomposition-based Linear Explainable LSTM to Capture Instantaneous and Long-term Effects in Time Series
Time series forecasting is prevalent in various real-world applications. Despite the promising results of deep learning models in time series forecasting, especially the Recurrent Neural Networks (RNNs), the explanations of time series models, which are critical in high-stakes applications, have received little attention. In this paper, we propose a Decomposition-based Linear Explainable LSTM (DeLELSTM) to improve the interpretability of LSTM. Conventionally, the interpretability of RNNs only concentrates on the variable importance and time importance. We additionally distinguish between the instantaneous influence of new coming data and the long-term effects of historical data. Specifically, DeLELSTM consists of two components, i.e., standard LSTM and tensorized LSTM. The tensorized LSTM assigns each variable with a unique hidden state making up a matrix h(t), and the standard LSTM models all the variables with a shared hidden state H(t). By decomposing the H(t) into the linear combination of past information h(t-1) and the fresh information h(t)-h(t-1), we can get the instantaneous influence and the long-term effect of each feature. In addition, the advantage of linear regression also makes the explanation transparent and clear. We demonstrate the effectiveness and interpretability of DeLELSTM on three empirical datasets. Extensive experiments show that the proposed method achieves competitive performance against the baseline methods and provides a reliable explanation relative to domain knowledge.
#J5943
On Tackling Explanation Redundancy in Decision Trees (Extended Abstract)
Claims about the interpretability of decision trees can be traced back to the origins of machine learning (ML). Indeed, given some input consistent with a decision tree’s path, the explanation for the resulting prediction consists of the features in that path. Moreover, a growing number of works propose the use of decision trees, and of other so-called interpretable models, as a possible solution for deploying ML models in high-risk applications. This paper overviews recent theoretical and practical results which demonstrate that for most decision trees, tree paths exhibit so-called explanation redundancy, in that logically sound explanations can often be significantly more succinct than what the features in the path dictates. More importantly, such decision tree explanations can be computed in polynomial-time, and so can be produced with essentially no effort other than traversing the decision tree. The experimental results, obtained on a large range of publicly available decision trees, support the paper’s claims.
#3127
Building Concise Logical Patterns by Constraining Tsetlin Machine Clause Size
Tsetlin Machine (TM) is a logic-based machine learning approach with the crucial advantages of being transparent and hardware-friendly. While TMs match or surpass deep learning accuracy for an increasing number of applications, large clause pools tend to produce clauses with many literals (long clauses). As such, they become less interpretable. Further, longer clauses increase the switching activity of the clause logic in hardware, consuming more power. This paper introduces a novel variant of TM learning — Clause Size Constrained TMs (CSC-TMs) — where one can set a soft constraint on the clause size. As soon as a clause includes more literals than the constraint allows, it starts expelling literals. Accordingly, oversized clauses only appear transiently. To evaluate CSC-TM, we conduct classification, clustering, and regression experiments on tabular data, natural language text, images, and board games. Our results show that CSC-TM maintains accuracy with up to 80 times fewer literals. Indeed, the accuracy increases with shorter clauses for TREC and BBC Sports. After the accuracy peaks, it drops gracefully as the clause size approaches one literal. We finally analyze CSC-TM power consumption and derive new convergence properties.
Tuesday 22nd August
11:45-12:45
CV: Segmentation (1/2)
#644
Dichotomous Image Segmentation with Frequency Priors
Dichotomous image segmentation (DIS) has a wide range of real-world applications and gained increasing research attention in recent years. In this paper, we propose to tackle DIS with informative frequency priors. Our model, called FP-DIS, stems from the fact that prior knowledge in the frequency domain can provide valuable cues to identify fine-grained object boundaries. Specifically, we propose a frequency prior generator to jointly utilize a fixed filter and learnable filters to extract informative frequency priors. Before embedding the frequency priors into the network, we first harmonize the multi-scale side-out features to reduce their heterogeneity. This is achieved by our feature harmonization module, which is based on a gating mechanism to harmonize the grouped features. Finally, we propose a frequency prior embedding module to embed the frequency priors into multi-scale features through an adaptive modulation strategy. Extensive experiments on the benchmark dataset, DIS5K, demonstrate that our FP-DIS outperforms state-of-the-art methods by a large margin in terms of key evaluation metrics.
#2989
ICDA: Illumination-Coupled Domain Adaptation Framework for Unsupervised Nighttime Semantic Segmentation
The performance of nighttime semantic segmentation has been significantly improved thanks to recent unsupervised methods. However, these methods still suffer from complex domain gaps, i.e., the challenging illumination gap and the inherent dataset gap. In this paper, we propose the illumination-coupled domain adaptation framework(ICDA) to effectively avoid the illumination gap and mitigate the dataset gap by coupling daytime and nighttime images as a whole with semantic relevance. Specifically, we first design a new composite enhancement method(CEM) that considers not only illumination but also spatial consistency to construct the source and target domain pairs, which provides the basic adaptation unit for our ICDA. Next, to avoid the illumination gap, we devise the Deformable Attention Relevance(DAR) module to capture the semantic relevance inside each domain pair, which can couple the daytime and nighttime images at the feature level and adaptively guide the predictions of nighttime images. Besides, to mitigate the dataset gap and acquire domain-invariant semantic relevance, we propose the Prototype-based Class Alignment(PCA) module, which improves the usage of category information and performs fine-grained alignment. Extensive experiments show that our method reduces the complex domain gaps and achieves state-of-the-art performance for nighttime semantic segmentation. Our code is available at https://github.com/chenghaoDong666/ICDA.
#4730
Spatially Covariant Lesion Segmentation
Patterns in medical images are usually more structured than those in natural images and therefore this adds flexibility and elasticity to resource-limited clinical applications by injecting proper priors into neural networks.
In this paper, we propose spatially covariant pixel-aligned classifier (SCP) to trade-off between computational efficiency and accuracy for lesion segmentation in human brain and liver.
SCP relaxes the spatial invariance constraint imposed by convolutional operations and optimizes an underlying implicit function that maps image coordinates to network weights, the parameters of which are obtained along with the backbone network training and later used for generating network weights to capture spatially variant contextual information.
We demonstrate the effectiveness and efficiency of the proposed SCP using two lesion segmentation tasks using different imaging sources: white matter hyperintensity segmentation in magnetic resonance imaging and liver tumor segmentation in contrast-enhanced abdominal computerized tomography.
The network using SCP has achieved 23.8%, 64.9% and 74.7% reduction in GPU memory usage, FLOPs, and network size with similar or better accuracy for lesion segmentation.
#2077
Fluid Dynamics-Inspired Network for Infrared Small Target Detection
Most infrared small target detection (ISTD) networks focus on building effective neural blocks or feature fusion modules but none describes the ISTD process from the image evolution perspective. The directional evolution of image pixels influenced by convolution, pooling and surrounding pixels is analogous to the movement of fluid elements constrained by surrounding variables ang particles. Inspired by this, we explore a novel research routine by abstracting the movement of pixels in the ISTD process as the flow of fluid in fluid dynamics (FD). Specifically, a new Fluid Dynamics-Inspired Network (FDI-Net) is devised for ISTD. Based on Taylor Central Difference (TCD) method, the TCD feature extraction block is designed, where convolution and Transformer structures are combined for local and global information. The pixel motion equation during the ISTD process is derived from the Navier–Stokes (N-S) equation, constructing a N-S Refinement Module that refines extracted features with edge details. Thus, the TCD feature extraction block determines the primary movement direction of pixels during detection, while the N-S Refinement Module corrects some skewed directions of the pixel stream to supplement the edge details. Experiments on IRSTD-1k and SIRST demonstrate that our method achieves SOTA performance in terms of evaluation metrics.
#2261
Contour-based Interactive Segmentation
Recent advances in interactive segmentation (IS)
allow speeding up and simplifying image editing
and labeling greatly. The majority of modern IS
approaches accept user input in the form of clicks.
However, using clicks may require too many user
interactions, especially when selecting small ob-
jects, minor parts of an object, or a group of ob-
jects of the same type. In this paper, we consider
such a natural form of user interaction as a loose
contour, and introduce a contour-based IS method.
We evaluate the proposed method on the standard
segmentation benchmarks, our novel UserContours
dataset, and its subset UserContours-G containing
difficult segmentation cases. Through experiments,
we demonstrate that a single contour provides the
same accuracy as multiple clicks, thus reducing the
required amount of user interactions.
Tuesday 22nd August
11:45-12:45
CV: Vision and Language (1/2)
#1918
Contrastive Learning for Sign Language Recognition and Translation
There are two problems that widely exist in current end-to-end sign language processing architecture. One is the CTC spike phenomenon which weakens the visual representational ability in Continuous Sign Language Recognition (CSLR). The other one is the exposure bias problem which leads to the accumulation of translation errors during inference in Sign Language Translation (SLT). In this paper, we tackle these issues by introducing contrast learning, aiming to enhance both visual-level feature representation and semantic-level error tolerance. Specifically, to alleviate CTC spike phenomenon and enhance visual-level representation, we design a visual contrastive loss by minimizing visual feature distance between different augmented samples of frames in one sign video, so that the model can further explore features by utilizing numerous unlabeled frames in an unsupervised way. To alleviate exposure bias problem and improve semantic-level error tolerance, we design a semantic contrastive loss by re-inputting the predicted sentence into semantic module and comparing features of ground-truth sequence and predicted sequence, for exposing model to its own mistakes. Besides, we propose two new metrics, i.e., Blank Rate and Consecutive Wrong Word Rate to directly reflect our improvement on the two problems. Extensive experimental results on current sign language datasets demonstrate the effectiveness of our approach, which achieves state-of-the-art performance.
#1813
A Dual Semantic-Aware Recurrent Global-Adaptive Network for Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) is a realistic but challenging task that requires an agent to locate the target region using verbal and visual cues. While significant advancements have been achieved recently, there are still two broad limitations: (1) The explicit information mining for significant guiding semantics concealed in both vision and language is still under-explored; (2) The previously structured map method provides the average historical appearance of visited nodes, while it ignores distinctive contributions of various images and potent information retention in the reasoning process. This work proposes a dual semantic-aware recurrent global-adaptive network (DSRG) to address the above problems. First, DSRG proposes an instruction-guidance linguistic module (IGL) and an appearance-semantics visual module (ASV) for boosting vision and language semantic learning respectively. For the memory mechanism, a global adaptive aggregation module (GAA) is devised for explicit panoramic observation fusion, and a recurrent memory fusion module (RMF) is introduced to supply implicit temporal hidden states. Extensive experimental results on the R2R and REVERIE datasets demonstrate that our method achieves better performance than existing methods. Code is available at https://github.com/CrystalSixone/DSRG.
#4777
Vision Language Navigation with Knowledge-driven Environmental Dreamer
Vision-language navigation (VLN) requires an agent to perceive visual observation in a house scene and navigate step-by-step following natural language instruction. Due to the high cost of data annotation and data collection, current VLN datasets provide limited instruction-trajectory data samples. Learning vision-language alignment for VLN from limited data is challenging since visual observation and language instruction are both complex and diverse. Previous works only generate augmented data based on original scenes while failing to generate data samples from unseen scenes, which limits the generalization ability of the navigation agent. In this paper, we introduce the Knowledge-driven Environmental Dreamer (KED), a method that leverages the knowledge of the embodied environment and generates unseen scenes for a navigation agent to learn. Generating an unseen environment with texture consistency and structure consistency is challenging. To address this problem, we incorporate three knowledge-driven regularization objectives into the KED and adopt a reweighting mechanism for self-adaptive optimization. Our KED method is able to generate unseen embodied environments without extra annotations. We use KED to successfully generate 270 houses and 500K instruction-trajectory pairs. The navigation agent with the KED method outperforms the state-of-the-art methods on various VLN benchmarks, such as R2R, R4R, and RxR. Both qualitative and quantitative experiments prove that our proposed KED method is able to high-quality augmentation data with texture consistency and structure consistency.
#4473
Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning
Video text spotting (VTS) aims at extracting texts from videos, where text detection, tracking and recognition are conducted simultaneously. There have been some works that can tackle VTS; however, they may ignore the underlying semantic relationships among texts within a frame. We observe that the texts within a frame usually share similar semantics, which suggests that, if one text is predicted incorrectly by a text recognizer, it still has a chance to be corrected via semantic reasoning. In this paper, we propose an accurate video text spotter, VLSpotter, that reads texts visually, linguistically, and semantically. For ‘visually’, we propose a plug-and-play text-focused super-resolution module to alleviate motion blur and enhance video quality. For ‘linguistically’, a language model is employed to capture intra-text context to mitigate wrongly spelled text predictions. For ‘semantically’, we propose a text-wise semantic reasoning module to model inter-text semantic relationships and reason for better results. The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.
#1932
Incorporating Unlikely Negative Cues for Distinctive Image Captioning
While recent neural image captioning models have shown great promise in terms of automatic metrics, they still struggle with generating generic sentences, which limits their use to only a handful of simple scenarios. On the other hand, negative training has been suggested as an effective way to prevent models from producing frequent yet meaningless sentences. However, when applied to image captioning, this approach may overlook low-frequency but generic and vague sentences, which can be problematic when dealing with diverse and changeable visual scenes. In this paper, we introduce a approach to improve image captioning by integrating negative knowledge that focuses on preventing the model from producing undesirable generic descriptions while addressing previous limitations. We accomplish this by training a negative teacher model that generates image-wise generic sentences with retrieval entropy-filtered data. Subsequently, the student model is required to maximize the distance with multi-level negative knowledge transferring for optimal guiding. Empirical results evaluated on MS COCO benchmark confirm that our plug-and-play framework incorporating unlikely negative knowledge leads to significant improvements in both accuracy and diversity, surpassing previous state-of-the-art methods for distinctive image captioning.
#3040
Dual Video Summarization: From Frames to Captions
Video summarization and video captioning both condense the video content from the perspective of visual and text modes, i.e. the keyframe selection and language description generation. Existing video-and-language learning models commonly sample multiple frames for training instead of observing all. These sampled deputies greatly improve computational efficiency, but do they represent the original video content enough with no more redundancy? In this work, we propose a dual video summarization framework and verify it in the context of video captioning. Given the video frames, we firstly extract the visual representation based on the ViT model fine-tuned on the video-text domain. Then we summarize the keyframes according to the frame-lever score. To compress the number of keyframes as much as possible while ensuring the quality of captioning, we learn a cross-modal video summarizer to select the most semantically consistent frames according to the pseudo score label. Top K frames ( K is no more than 3% of the entire video.) are chosen to form the video representation. Moreover, to evaluate the static appearance and temporal information of video, we design the ranking scheme of video representation from two aspects: feature-oriented and sequence-oriented. Finally, we generate the descriptions with a lightweight LSTM decoder. The experiment results on the MSR-VTT and MSVD dataset reveal that, for the generative task as video captioning, a small number of keyframes can convey the same semantic information to perform well on captioning, or even better than the original sampling.
Tuesday 22nd August
11:45-12:45
Multidisciplinary Topics and Applications (1/4)
#1957
Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies
This paper introduces Local Learner (2L), an algorithm for providing a set of reference strategies to guide the search for programmatic strategies in two-player zero-sum games. Previous learning algorithms, such as Iterated Best Response (IBR), Fictitious Play (FP), and Double-Oracle (DO), can be computationally expensive or miss important information for guiding search algorithms. 2L actively selects a set of reference strategies to improve the search signal. We empirically demonstrate the advantages of our approach while guiding a local search algorithm for synthesizing strategies in three games, including MicroRTS, a challenging real-time strategy game. Results show that 2L learns reference strategies that provide a stronger search signal than IBR, FP, and DO. We also simulate a tournament of MicroRTS, where a synthesizer using 2L outperformed the winners of the two latest MicroRTS competitions, which were programmatic strategies written by human programmers.
#2793
Revisiting the Evaluation of Deep Learning-Based Compiler Testing
A high-quality program generator is essential to effective automated compiler testing. Engineering such a program generator is difficult, time-consuming, and specific to the language under testing, thus requiring tremendous efforts from human experts with language-specific domain knowledge. To avoid repeatedly writing program generators for different languages, researchers recently proposed a language-agnostic approach based on deep learning techniques to automatically learn a program generator (referred to as DLG) from existing programs. Evaluations show that DLGs outperform Language-Specific Program Generators (LSGs) in testing compilers.
However, we argue that it is unfair to use LSGs as baselines to evaluate DLGs. LSGs aim to validate compiler optimizations by only generating compilable, well-defined test programs; this restriction inevitably impairs the diversity of the language features used in the generated programs. In contrast, DLGs do not aim to validate the correctness of compiler optimizations, and its generated programs are not guaranteed to be well-defined or even compilable. Therefore, it is not surprising that DLG-generated programs are more diverse in terms of used language features than LSG-generated ones.
This study revisits the evaluation of DLGs, and proposes a new, fair, simple yet strong baseline named Kitten for evaluating DLGs. Given a dataset consisting of human-written programs, instead of using deep learning techniques to learn a program generator, Kitten directly derives new programs by mutating the programs in the dataset. Extensive experiments with more than 1,500 CPU-hours demonstrate that the state-of-the-art DLGs fail to compete against such a simple baseline: 3 v.s. 1,750 hang bugs, 1 v.s. 34 distinct compiler crashes. We believe that DLGs still have a large room for improvement.
#3566
GLPocket: A Multi-Scale Representation Learning Approach for Protein Binding Site Prediction
Protein binding site prediction is an important prerequisite for the discovery of new drugs. Usually, natural 3D U-Net is adopted as the standard site prediction framework to do per-voxel binary mask classification. However, this scheme only performs feature extraction for single-scale samples, which may bring the loss of global or local information, resulting in incomplete, artifacted or even missed predictions. To tackle this issue, we propose a network called GLPocket, which is based on the Lmser (Least mean square error reconstruction) network and utilizes multi-scale representation to predict binding sites. Firstly, GLPocket uses Target Cropping Block (TCB) for targeted prediction. TCB selects the local interested feature from the global representations to perform concentrated prediction, and reduces the volume of feature maps to be calculated by 82% without adding additional parameters. It integrates global distribution information into local regions, making prediction more concentrated on decoding stage. Secondly, GLPocket establishes long-range relationship of patches within the local region with Transformer Block (TB), to enrich local context semantic information. Experiments show that GLPocket improves by 0.5%-4% on DCA Top-n prediction compared with previous state-of-the-art methods on four datasets. Our code has been released in https://github.com/CMACH508/GLPocket.
#4766
Toward Convex Manifolds: A Geometric Perspective for Deep Graph Clustering of Single-cell RNA-seq Data
The deep clustering paradigm has shown great potential for discovering complex patterns that can reveal cell heterogeneity in single-cell RNA sequencing data. This paradigm involves two training phases: pretraining based on a pretext task and fine-tuning using pseudo-labels. Although current models yield promising results, they overlook the geometric distortions that regularly occur during the training process. More precisely, the transition between the two phases results in a coarse flattening of the latent structures, which can deteriorate the clustering performance. In this context, existing methods perform euclidean-based embedding clustering without ensuring the flatness and convexity of the latent manifolds. To address this problem, we incorporate two mechanisms. First, we introduce an overclustering loss to flatten the local curves. Second, we propose an adversarial mechanism to adjust the global geometric configuration. The second mechanism gradually transforms the latent structures into convex ones. Empirical results on a variety of gene expression datasets show that our model outperforms state-of-the-art methods.
#SC23
FastGR: Global Routing on CPU-GPU with Heterogeneous Task Graph Scheduler (Extended Abstract)
Routing is critical to the physical design flow of integrated circuits. However, with the rapid growth in design sizes, routing has become the runtime bottleneck in the flow. In this paper, we propose a global routing framework, FastGR, with GPU-accelerated pattern routing and a heterogeneous task graph scheduler to accelerate the modern global router and improve its effectiveness.
#SV5557
A Systematic Survey of Chemical Pre-trained Models
Deep learning has achieved remarkable success in learning representations for molecules, which is crucial for various biochemical applications, ranging from property prediction to drug design. However, training Deep Neural Networks (DNNs) from scratch often requires abundant labeled molecules, which are expensive to acquire in the real world. To alleviate this issue, tremendous efforts have been devoted to Chemical Pre-trained Models (CPMs), where DNNs are pre-trained using large-scale unlabeled molecular databases and then fine-tuned over specific downstream tasks. Despite the prosperity, there lacks a systematic review of this fast-growing field. In this paper, we present the first survey that summarizes the current progress of CPMs. We first highlight the limitations of training molecular representation models from scratch to motivate CPM studies. Next, we systematically review recent advances on this topic from several key perspectives, including molecular descriptors, encoder architectures, pre-training strategies, and applications. We also highlight the challenges and promising avenues for future research, providing a useful resource for both machine learning and scientific communities.
Tuesday 22nd August
11:45-12:45
Data Mining (1/3)
#3221
Beyond Homophily: Robust Graph Anomaly Detection via Neural Sparsification
Recently, graph-based anomaly detection (GAD) has attracted rising attention due to its effectiveness in identifying anomalies in relational and structured data. Unfortunately, the performance of most existing GAD methods suffers from the inherent structural noises of graphs induced by hidden anomalies connected with considerable benign nodes. In this work, we propose SparseGAD, a novel GAD framework that sparsifies the structures of target graphs to effectively reduce noises and collaboratively learns node representations. It then robustly detects anomalies by uncovering the underlying dependency among node pairs in terms of homophily and heterophily, two essential connection properties of GAD. Extensive experiments on real-world datasets of GAD demonstrate that the proposed framework achieves significantly better detection quality compared with the state-of-the-art methods, even when the graph is heavily attacked. Code will be available at https://github.com/KellyGong/SparseGAD.git.
#3475
Do We Need an Encoder-Decoder to Model Dynamical Systems on Networks?
As deep learning gains popularity in modelling dynamical systems, we expose an underappreciated misunderstanding relevant to modelling dynamics on networks. Strongly influenced by graph neural networks, latent vertex embeddings are naturally adopted in many neural dynamical network models. However, we show that embeddings tend to induce a model that fits observations well but simultaneously has incorrect dynamical behaviours. Recognising that previous studies narrowly focus on short-term predictions during the transient phase of a flow, we propose three tests for correct long-term behaviour, and illustrate how an embedding-based dynamical model fails these tests, and analyse the causes, particularly through the lens of topological conjugacy. In doing so, we show that the difficulties can be avoided by not using embedding. We propose a simple embedding-free alternative based on parametrising two additive vector-field components. Through extensive experiments, we verify that the proposed model can reliably recover a broad class of dynamics on different network topologies from time series data.
#SV5593
Generative Diffusion Models on Graphs: Methods and Applications
Diffusion models, as a novel generative paradigm, have achieved remarkable success in various image generation tasks such as image inpainting, image-to-text translation, and video generation. Graph generation is a crucial computational task on graphs with numerous real-world applications. It aims to learn the distribution of given graphs and then generate new graphs. Given the great success of diffusion models in image generation, increasing efforts have been made to leverage these techniques to advance graph generation in recent years. In this paper, we first provide a comprehensive overview of generative diffusion models on graphs, In particular, we review representative algorithms for three variants of graph diffusion models, i.e., Score Matching with Langevin Dynamics (SMLD), Denoising Diffusion Probabilistic Model (DDPM), and Score-based Generative Model (SGM). Then, we summarize the major applications of generative diffusion models on graphs with a specific focus on molecule and protein modeling. Finally, we discuss promising directions in generative diffusion models on graph-structured data.
#299
Model Conversion via Differentially Private Data-Free Distillation
While massive valuable deep models trained on large-scale data have been released to facilitate the artificial intelligence community, they may encounter attacks in deployment which leads to privacy leakage of training data. In this work, we propose a learning approach termed differentially private data-free distillation (DPDFD) for model conversion that can convert a pretrained model (teacher) into its privacy-preserving counterpart (student) via an intermediate generator without access to training data. The learning collaborates three parties in a unified way. First, massive synthetic data are generated with the generator. Then, they are fed into the teacher and student to compute differentially private gradients by normalizing the gradients and adding noise before performing descent. Finally, the student is updated with these differentially private gradients and the generator is updated by taking the student as a fixed discriminator in an alternate manner. In addition to a privacy-preserving student, the generator can generate synthetic data in a differentially private way for other down-stream tasks. We theoretically prove that our approach can guarantee differential privacy and well convergence. Extensive experiments that significantly outperform other differentially private generative approaches demonstrate the effectiveness of our approach.
#4929
Denoised Self-Augmented Learning for Social Recommendation
Social recommendation is gaining increasing attention in various online applications, including e-commerce and online streaming, where social information is leveraged to improve user-item interaction modeling. Recently, Self-Supervised Learning (SSL) has proven to be remarkably effective in addressing data sparsity through augmented learning tasks. Inspired by this, researchers have attempted to incorporate SSL into social recommendation by supplementing the primary supervised task with social-aware self-supervised signals. However, social information can be unavoidably noisy in characterizing user preferences due to the ubiquitous presence of interest-irrelevant social connections, such as colleagues or classmates who do not share many common interests. To address this challenge, we propose a novel social recommender called the Denoised Self-Augmented Learning paradigm (DSL). Our model not only preserves helpful social relations to enhance user-item interaction modeling but also enables personalized cross-view knowledge transfer through adaptive semantic alignment in embedding space. Our experimental results on various recommendation benchmarks confirm the superiority of our DSL over state-of-the-art methods. We release our model implementation at: https://github.com/HKUDS/DSL.
#752
Federated Probabilistic Preference Distribution Modelling with Compactness Co-Clustering for Privacy-Preserving Multi-Domain Recommendation
With the development of modern internet techniques, Cross-Domain Recommendation (CDR) systems have been widely exploited for tackling the data-sparsity problem. Meanwhile most current CDR models assume that user-item interactions are accessible across different domains. However, such knowledge sharing process will break the privacy protection policy. In this paper, we focus on the Privacy-Preserving Multi-Domain Recommendation problem (PPMDR). The problem is challenging since different domains are sparse and heterogeneous with the privacy protection. To tackle the above issues, we propose Federated Probabilistic Preference Distribution Modelling (FPPDM). FPPDM includes two main components, i.e., local domain modelling component and global server aggregation component with federated learning strategy. The local domain modelling component aims to exploit user/item preference distributions using the rating information in the corresponding domain. The global server aggregation component is set to combine user characteristics across domains. To better extract semantic neighbors information among the users, we further provide compactness co-clustering strategy in FPPDM ++ to cluster the users with similar characteristics. Our empirical studies on benchmark datasets demonstrate that FPPDM/ FPPDM ++ significantly outperforms the state-of-the-art models.
Tuesday 22nd August
11:45-12:45
Agent-based and Multi-agent Systems (1/4)
#SC19
Task Allocation on Networks with Execution Uncertainty
We study a single task allocation problem where each worker connects to some other workers to form a network and the task requester only connects to some of the workers. The goal is to design an allocation mechanism such that each worker is incentivized to invite her neighbours to join the allocation, although they are competing for the task. Moreover, the performance of each worker is uncertain, which is modelled as the quality level of her task execution. The literature has proposed solutions to tackle the uncertainty problem by paying them after verifying their execution. Here, we extend the problem to the network setting. We propose a new mechanism that guarantees that inviting more workers and reporting/performing according to her true ability is a dominant strategy for each worker. We believe that the new solution can be widely applied in the digital economy powered by social connections such as crowdsourcing.
#1363
Artificial Agents Inspired by Human Motivation Psychology for Teamwork in Hazardous Environments
Multi-agent literature explores personifying artificial agents with personality, emotions or cognitive biases to produce “typical”, believable agents. In
this study, we demonstrate the potential of endowing artificial agents with a motivation, using human implicit motivation psychology theory that introduces 3 motive profiles – power, achievement and affiliation, to create diverse, risk-aware agents. We first devise a framework to model these motivated agents (or agents with any inherent behavior), that can activate different strategies depending on the circumstances. We conduct experiments on a fire-fighting task domain, evaluate how motivated teams perform, and draw conclusions on appropriate team compositions to be deployed in environments with different risk levels. Our framework generates predictable agents as their resulting behaviors align with the inherent characteristics of their motives. We find that motivational diversity within teams is beneficial in dynamic collaborative environments, especially as the task risk level increases. Furthermore, we observed that the best composition in terms of the performance metrics used to evaluate team compositions, does not remain the same as the collaboration level required to achieve goals changes. These results have implications for future designs of risk-aware autonomous teams and Human-AI teams, as they highlight the prospects of creating better artificial teammates and performance gains that could be achieved through anthropomorphized motivated agents.
#619
Controlling Neural Style Transfer with Deep Reinforcement Learning
Controlling the degree of stylization in the Neural Style Transfer (NST) is a little tricky since it usually needs hand-engineering on hyper-parameters. In this paper, we propose the first deep Reinforcement Learning (RL) based architecture that splits one-step style transfer into a step-wise process for the NST task. Our RL-based method tends to preserve more details and structures of the content image in early steps, and synthesize more style patterns in later steps. It is a user-easily-controlled style-transfer method. Additionally, as our RL-based model performs the stylization progressively, it is lightweight and has lower computational complexity than existing one-step Deep Learning (DL) based models. Experimental results demonstrate the effectiveness and robustness of our method.
#4015
Optimal Anytime Coalition Structure Generation Utilizing Compact Solution Space Representation
Coalition formation is a central approach for multiagent coordination. A crucial part of coalition formation that is extensively studied in AI is coalition structure generation: partitioning agents into coalitions to maximize overall value.
In this paper, we propose a novel method for coalition structure generation by introducing a compact and efficient representation of coalition structures. Our representation partitions the solution space into smaller, more manageable subspaces that gather structures containing coalitions of specific sizes. Our proposed method combines two new algorithms, one which leverages our compact representation and a branch-and-bound technique to generate optimal coalition structures, and another that utilizes a preprocessing phase to identify the most promising sets of coalitions to evaluate. Additionally, we show how parts of the solution space can be gathered into groups to avoid their redundant evaluation and we investigate the computational gain that is achieved by avoiding that redundant processing. Through this approach, our algorithm is able to prune the solution space more efficiently. Our results show that the proposed algorithm is superior to prior state-of-the-art methods in generating optimal coalition structures under several value distributions.
#962
Quick Multi-Robot Motion Planning by Combining Sampling and Search
We propose a novel algorithm to solve multi-robot motion planning (MRMP) rapidly, called Simultaneous Sampling-and-Search Planning (SSSP). Conventional MRMP studies mostly take the form of two-phase planning that constructs roadmaps and then finds inter-robot collision-free paths on those roadmaps. In contrast, SSSP simultaneously performs roadmap construction and collision-free pathfinding. This is realized by uniting techniques of single-robot sampling-based motion planning and search techniques of multi-agent pathfinding on discretized spaces. Doing so builds the small search space, leading to quick MRMP. SSSP ensures finding a solution eventually if exists. Our empirical evaluations in various scenarios demonstrate that SSSP significantly outperforms standard approaches to MRMP, i.e., solving more problem instances much faster. We also applied SSSP to planning for 32 ground robots in a dense situation.
#978
Helpful Information Sharing for Partially Informed Planning Agents
In many real-world settings, an autonomous agent may not have sufficient information or sensory capabilities to accomplish its goals, even when they are achievable. In some cases, the needed information can be provided by another agent, but information sharing might be costly due to limited communication bandwidth and other constraints. We address the problem of Helpful Information Sharing (HIS), which focuses on selecting minimal information to reveal to a partially informed agent in order to guarantee it can achieve its goal. We offer a novel compilation of HIS to a classical planning problem, which can be solved efficiently by any off-the-shelf planner. We provide guarantees of optimality for our approach and describe its extensions to maximize robustness and support settings in which the agent needs to decide which sensors to deploy in the environment. We demonstrate the power of our approaches on a set of standard benchmarks as well as on a novel benchmark.
Tuesday 22nd August
11:45-12:45
Knowledge Representation and Reasoning (1/4)
#4418
Relative Inconsistency Measures for Indefinite Databases with Denial Constraints
Handling conflicting information is an important challenge in AI. Measuring inconsistency is an approach that provides ways to quantify the severity of inconsistency and helps understanding the primary sources of conflicts. In particular, a relative inconsistency measure computes, by some criteria, the proportion of the knowledge base that is inconsistent. In this paper we investigate relative inconsistency measures for indefinite databases, which allow for indefinite or partial information which is formally expressed by means of disjunctive tuples. We introduce a postulate-based definition of relative inconsistency measure for indefinite databases with denial constraints, and investigate the compliance of some relative inconsistency measures with rationality postulates for indefinite databases as well as for the special case of definite databases. Finally, we investigate the complexity of the problem of computing the value of the proposed relative inconsistency measures as well as of the problems of deciding whether the inconsistency value is lower than, greater than, or equal to a given threshold for indefinite and definite databases.
#J5922
Creative Problem Solving in Artificially Intelligent Agents: A Survey and Framework (Extended Abstract)
Creative Problem Solving (CPS) is a sub-area within artificial intelligence that focuses on methods for solving off-nominal, or anomalous problems in autonomous systems. Despite many advancements in planning and learning in AI, resolving novel problems or adapting existing knowledge to a new context, especially in cases where the environment may change in unpredictable ways, remains a challenge. To stimulate further research in CPS, we contribute a definition and a framework of CPS, which we use to categorize existing AI methods in this field. We conclude our survey with open research questions, and suggested future directions.
#1679
Temporal Datalog with Existential Quantification
Existential rules, also known as tuple-generating dependencies (TGDs) or Datalog+/- rules, are heavily studied in the communities of Knowledge Representation and Reasoning, Semantic Web, and Databases, due to their rich modelling capabilities. In this paper we consider TGDs in the temporal setting, by introducing and studying DatalogMTLE—an extension of metric temporal Datalog (DatalogMTL) obtained by allowing for existential rules in programs. We show that DatalogMTLE is undecidable even in the restricted cases of guarded and weakly-acyclic programs. To address this issue we introduce uniform semantics which, on the one hand, is well-suited for modelling temporal knowledge as it prevents from unintended value invention and, on the other hand, provides decidability of reasoning; in particular, it becomes 2-EXPSPACE-complete for weakly-acyclic programs but remains undecidable for guarded programs. We provide an implementation for the decidable case and demonstrate its practical feasibility.
Thus we obtain an expressive, yet decidable, rule-language and a system which is suitable for complex temporal reasoning with existential rules.
#1223
Augmenting Automated Spectrum Based Fault Localization for Multiple Faults
Spectrum-based Fault Localization (SBFL) uses the coverage of test cases and their outcome (pass/fail) to predict the “suspiciousness” of program components, e.g., lines of code. SBFL is, perhaps, the most successful fault localization technique due to its simplicity and scalability. However, SBFL heuristics do not perform well in scenarios where a program may have multiple faulty components. In this work, we propose a new algorithm that “augments” previously proposed SBFL heuristics to produce a ranked list where faulty components ranked low by base SBFL metrics are ranked significantly higher. We implement our ideas in a tool, ARTEMIS, that attempts to “bubble up” faulty components which are ranked lower by base SBFL metrics. We compare our technique to the most popular SBFL metrics and demonstrate statistically significant improvement in the developer effort for fault localization with respect to the basic strategies.
#3153
Disentanglement of Latent Representations via Causal Interventions
The process of generating data such as images is controlled by independent and unknown factors of variation. The retrieval of these variables has been studied extensively in the disentanglement, causal representation learning, and independent component analysis fields. Recently, approaches merging these domains together have shown great success. Instead of directly representing the factors of variation, the problem of disentanglement can be seen as finding the interventions on one image that yield a change to a single factor. Following this assumption, we introduce a new method for disentanglement inspired by causal dynamics that combines causality theory with vector-quantized variational autoencoders. Our model considers the quantized vectors as causal variables and links them in a causal graph. It performs causal interventions on the graph and generates atomic transitions affecting a unique factor of variation in the image. We also introduce a new task of action retrieval that consists of finding the action responsible for the transition between two images. We test our method on standard synthetic and real-world disentanglement datasets. We show that it can effectively disentangle the factors of variation and perform precise interventions on high-level semantic attributes of an image without affecting its quality, even with imbalanced data distributions.
#4461
Efficient Computation of General Modules for ALC Ontologies
We present a method for extracting general modules for ontologies formulated in the description logic ALC. A module for an ontology is an ideally substantially smaller ontology that preserves all entailments for a user-specified set of terms. As such, it has applications such as ontology reuse and ontology analysis. Different from classical modules, general modules may use axioms not explicitly present in the input ontology, which allows for additional conciseness. So far, general modules have only been investigated for lightweight description logics.
We present the first work that considers the more expressive description logic ALC. In particular, our contribution is a new method based on uniform interpolation supported by some new theoretical results. Our evaluation indicates that our general modules are often smaller than classical modules and uniform interpolants computed by the state-of-the-art, and compared with uniform interpolants, can be computed in significantly shorter time. Moreover, our method can be used for, and in fact, improves the computation of uniform interpolants and classical modules.
Tuesday 22nd August
11:45-12:45
Uncertainty in AI (1/2)
#4438
Probabilistic Rule Induction from Event Sequences with Logical Summary Markov Models
Event sequences are widely available across application domains and there is a long history of models for representing and analyzing such datasets. Summary Markov models are a recent addition to the literature that help identify the subset of event types that influence event types of interest to a user. In this paper, we introduce logical summary Markov models, which are a family of models for event sequences that enable interpretable predictions through logical rules that relate historical predicates to the probability of observing an event type at any arbitrary position in the sequence. We illustrate their connection to prior parametric summary Markov models as well as probabilistic logic programs, and propose new models from this family along with efficient greedy search algorithms for learning them from data. The proposed models outperform relevant baselines on most datasets in an empirical investigation on a probabilistic prediction task. We also compare the number of influencers that various logical summary Markov models learn on real-world datasets, and conduct a brief exploratory qualitative study to gauge the promise of such symbolic models around guiding large language models for predicting societal events.
#2147
On the Complexity of Counterfactual Reasoning
We study the computational complexity of counterfactual reasoning in relation to the complexity of associational and interventional reasoning on structural causal models (SCMs). We show that counterfactual reasoning is no harder than associational or interventional reasoning on fully specified SCMs in the context of two computational frameworks. The first framework is based on the notion of treewidth and includes the classical variable elimination and jointree algorithms. The second framework is based on the more recent and refined notion of causal treewidth which is directed towards models with functional dependencies such as SCMs. Our results are constructive and based on bounding the (causal) treewidth of twin networks—used in standard counterfactual reasoning that contemplates two worlds, real and imaginary—to the (causal) treewidth of the underlying SCM structure. In particular, we show that the latter (causal) treewidth is no more than twice the former plus one. Hence, if associational or interventional reasoning is tractable on a fully specified SCM then counterfactual reasoning is tractable too. We extend our results to general counterfactual reasoning that requires contemplating more than two worlds and discuss applications of our results to counterfactual reasoning with partially specified SCMs that are coupled with data. We finally present empirical results that measure the gap between the complexities of counterfactual reasoning and associational/interventional reasoning on random SCMs.
#4402
Max Markov Chain
In this paper, we introduce Max Markov Chain (MMC), a novel model for sequential data with sparse correlations among the state variables.
It may also be viewed as a special class of approximate models for High-order Markov Chains (HMCs).
MMC is desirable for domains where the sparse correlations are long-term and vary in their temporal stretches.
Although generally intractable, parameter optimization for MMC can be solved analytically.
However, based on this result,
we derive an approximate solution that is highly efficient empirically.
When compared with HMC and approximate HMC models, MMC
combines better sample efficiency, model parsimony, and an outstanding computational advantage.
Such a quality allows MMC to scale to large domains
where the competing models would struggle to perform.
We compare MMC with several baselines with synthetic and real-world datasets to demonstrate MMC as a valuable alternative for stochastic modeling.
#2847
Approximate Inference in Logical Credal Networks
The Logical Credal Network or LCN is a recent probabilistic logic designed for effective aggregation and reasoning over multiple sources of imprecise knowledge. An LCN specifies a set of probability distributions over all interpretations of a set of logical formulas for which marginal and conditional probability bounds on their truth values are known. Inference in LCNs involves the exact solution of a non-convex non-linear program defined over an exponentially large number of non-negative real valued variables and, therefore, is limited to relatively small problems. In this paper, we present ARIEL — a novel iterative message-passing scheme for approximate inference in LCNs. Inspired by classical belief propagation for graphical models, our method propagates messages that involve solving considerably smaller local non-linear programs. Experiments on several classes of LCNs demonstrate clearly that ARIEL yields high quality solutions compared with exact inference and scales to much larger problems than previously considered.
#2836
The Hardness of Reasoning about Probabilities and Causality
We study formal languages which are capable of fully expressing quantitative probabilistic reasoning and do-calculus reasoning for causal effects, from a computational complexity perspective.
We focus on satisfiability problems whose instance formulas allow expressing many tasks in probabilistic and causal inference.
The main contribution of this work is establishing the exact computational complexity of these satisfiability problems.
We introduce a new natural complexity class, named succ∃R, which can be viewed as a succinct variant of the well-studied class ∃R, and show that these problems are complete for succ∃R.
Our results imply even stronger algorithmic limitations than were proven by Fagin, Halpern, and Megiddo (1990) and Mossé, Ibeling, and Icard (2022) for some variants of the standard languages used commonly in probabilistic and causal inference.
#4309
Quantifying Consistency and Information Loss for Causal Abstraction Learning
Structural causal models provide a formalism to express causal relations between variables of interest. Models and variables can represent a system at different levels of abstraction, whereby relations may be coarsened and refined according to the need of a modeller.
However, switching between different levels of abstraction requires evaluating a trade-off between the consistency and the information loss among different models.
In this paper we introduce a family of interventional measures that an agent may use to evaluate such a trade-off. We consider four measures suited for different tasks, analyze their properties, and propose algorithms to evaluate and learn causal abstractions. Finally, we illustrate the flexibility of our setup by empirically showing how different measures and algorithmic choices may lead to different abstractions.
Tuesday 22nd August
11:45-12:45
Early Career 1
#EC10
Pushing the Limits of Fairness in Algorithmic Decision-Making
Designing provably fair decision-making algorithms is a task of growing interest and importance. In this article, I argue that preference-based notions of fairness proposed decades ago in the economics literature and subsequently explored in-depth within computer science (specifically, within the field of computational social choice) are aptly suited for a wide range of modern decision-making systems, from conference peer review to recommender systems to participatory budgeting.
#EC6
AI and Multi-agent Systems for Real World Decision Making
Show Abstract
Hide Abstract
Game theory is a popular model for studying multi-agent systems. In this talk, I will present my work on modelling of adversarial multi-agent problems using Stackelberg game models, evolving from standard utility maximizing players to rich models of bounded rationality. A first line of work using utility maximizing adversary models introduced the model of audit games and threat screening games, with novel optimization methods for solving these problems at scale. A second line of work has looked at various aspects of learning bounded rational behavior and optimizing strategic decisions based on the learned models; these aspects include study of different classes of bounded rationality, scalable optimization methods for these highly non-linear models, and high fidelity learning of behavior model of multiple interacting agents from data. Overall, data-driven behavior models with principled strategic decision optimization presents many opportunities for research as well as applications for societal benefit.
#EC2
The Economics of Machine Learning
This survey overviews a new research agenda on the economics of machine learning, pursued at the Strategic IntelliGence for Machine Agent (SIGMA) Lab at UChicago. This overall research agenda has two themes: machine learning for economics and, conversely, economics for machine learning (ML). The first theme focuses on designing and analyzing ML algorithms for economic problems, ranging from foundational economic models to influential real-world applications such as recommender systems and national security. The second theme employs economic principles to study machine learning itself, such as the valuation and pricing of data, information and ML models, and designing incentive mechanisms to improve large-scale ML research peer reviews. While our research focuses primarily on developing methodologies, in each theme we also highlight some real-world impacts of these works, including ongoing large-scale live experiments and potential deployments for various applications.
Tuesday 22nd August
11:45-12:45
AI for Social Good – Humans and AI
#AI4SG5684
Group Sparse Optimal Transport for Sparse Process Flexibility Design
As a fundamental problem in Operations Research, sparse process flexibility design (SPFD) aims to design a manufacturing network across industries that achieves a trade-off between the efficiency and robustness of supply chains.
In this study, we propose a novel solution to this problem with the help of computational optimal transport techniques.
Given a set of supply-demand pairs, we formulate the SPFD task approximately as a group sparse optimal transport (GSOT) problem, in which a group of couplings between the supplies and demands is optimized with a group sparse regularizer.
We solve this optimization problem via an algorithmic framework of alternating direction method of multipliers (ADMM), in which the target network topology is updated by soft-thresholding shrinkage, and the couplings of the OT problems are updated via a smooth OT algorithm in parallel.
This optimization algorithm has guaranteed convergence and provides a generalized framework for the SPFD task, which is applicable regardless of whether the supplies and demands are balanced.
Experiments show that our GSOT-based method can outperform representative heuristic methods in various SPFD tasks.
Additionally, when implementing the GSOT method, the proposed ADMM-based optimization algorithm is comparable or superior to the commercial software Gurobi.
The code is available at https://github.com/Dixin-s-Lab/GSOT.
#AI4SG5788
Balancing Social Impact, Opportunities, and Ethical Constraints of Using AI in the Documentation and Vitalization of Indigenous Languages
In this paper we discuss how AI can contribute to support the documentation and vitalization of Indigenous languages and how that involves a delicate balancing of ensuring social impact, exploring technical opportunities, and dealing with ethical constraints. We start by surveying previous work on using AI and NLP to support critical activities of strengthening Indigenous and endangered languages and discussing key limitations of current technologies. After presenting basic ethical constraints of working with Indigenous languages and communities, we propose that creating and deploying language technology ethically with and for Indigenous communities forces AI researchers and engineers to address some of the main shortcomings and criticisms of current technologies. Those ideas are also explored in the discussion of a real case of development of large language models for Brazilian Indigenous languages.
#AI4SG5800
GreenFlow: A Computation Allocation Framework for Building Environmentally Sound Recommendation System
Given the enormous number of users and items, industrial cascade recommendation systems (RS) are continuously expanded in size and complexity to deliver relevant items, such as news, services, and commodities, to the appropriate users. In a real-world scenario with hundreds of thousands requests per second, significant computation is required to infer personalized results for each request, resulting in a massive energy consumption and carbon emission that raises concern.
This paper proposes GreenFlow, a practical computation allocation framework for RS, that considers both accuracy and carbon emission during inference. For each stage (e.g., recall, pre-ranking, ranking, etc.) of a cascade RS, when a user triggers a request, we define two actions that determine the computation: (1) the trained instances of models with different computational complexity; and (2) the number of items to be inferred in the stage. We refer to the combinations of actions in all stages as action chains. A reward score is estimated for each action chain, followed by dynamic primal-dual optimization considering both the reward and computation budget. Extensive experiments verify the effectiveness of the framework, reducing computation consumption by 41% in an industrial mobile application while maintaining commercial revenue. Moreover, the proposed framework saves approximately 5000kWh of electricity and reduces 3 tons of carbon emissions per day.
#AI4SG5813
Computationally Assisted Quality Control for Public Health Data Streams
Irregularities in public health data streams (like COVID-19 Cases) hamper data-driven decision-making for public health stakeholders. A real-time, computer-generated list of the most important, outlying data points from thousands of public health data streams could assist an expert reviewer in identifying these irregularities. However, existing outlier detection frameworks perform poorly on this task because they do not account for the data volume or for the statistical properties of public health streams. Accordingly, we developed FlaSH (Flagging Streams in public Health), a practical outlier detection framework for public health data users that uses simple, scalable models to capture these statistical properties explicitly. In an experiment where human experts evaluate FlaSH and existing methods (including deep learning approaches), FlaSH scales to the data volume of this task, matches or exceeds these other methods in mean accuracy, and identifies the outlier points that users empirically rate as more helpful. Based on these results, FlaSH has been deployed on data streams used by public health stakeholders.
#AI4SG5836
A Quantitative Game-theoretical Study on Externalities of Long-lasting Humanitarian Relief Operations in Conflict Areas
Humanitarian relief operations are often accompanied by regional conflicts around the globe, at risk of deliberate, persistent and unpredictable attacks. However, the long-term channeling of aid resources into conflict areas may influence subsequent patterns of violence and expose local communities to new risks. In this paper, we quantitatively analyze the potential externalities associated with long-lasting humanitarian relief operations based on game-theoretical modeling and online planning approaches. Specifically, we first model the problem of long-lasting humanitarian relief operations in conflict areas as an online multi-stage rescuer-and-attacker interdiction game in which aid demands are revealed in an online fashion. Both models of single-source and multiple-source relief supply policy are established respectively, and two corresponding near-optimal online algorithms are proposed. In conjunction with a real case of anti-Ebola practice in conflict areas of DR Congo, we find that 1) long-lasting humanitarian relief operations aiming alleviation of crises in conflict areas can lead to indirect funding of local rebel groups; 2) the operations can activate the rebel groups to some extent, as evidenced by the scope expansion of their activities. Furthermore, the impacts of humanitarian aid intensity, frequency and supply policies on the above externalities are quantitatively analyzed, which will provide enlightening decision-making support for the implementation of related operations in the future.
#AI4SG5879
PARTNER: A Persuasive Mental Health and Legal Counselling Dialogue System for Women and Children Crime Victims
The World Health Organization has underlined the significance of expediting the preventive measures for crime against women and children to attain the United Nations Sustainable Development Goals 2030 (promoting well-being, gender equality, and equal access to justice). The crime victims typically need mental health and legal counselling support for their ultimate well-being and sometimes they need to be persuaded to seek desired support. Further, counselling interactions should adopt correct politeness and empathy strategies so that a warm, amicable, and respectful environment can be built to better understand the victims’ situations. To this end, we propose PARTNER, a Politeness and empAthy strategies-adaptive peRsuasive dialogue sysTem for meNtal health and LEgal counselling of cRime victims. For this, first, we create a novel mental HEalth and legAl counseLling conversational dataset HEAL, annotated with three distinct aspects, viz. counselling act, politeness strategy, and empathy strategy. Then, by formulating a novel reward function, we train a counselling dialogue system in a reinforcement learning setting to ensure correct counselling act, politeness strategy, and empathy strategy in the generated responses. Extensive empirical analysis and experimental results show that the proposed reward function ensures persuasive counselling responses with correct polite and empathetic tone in the generated responses. Further, PARTNER proves its efficacy to engage the victim by generating diverse and natural responses.
Tuesday 22nd August
15:30-16:50
Machine Learning (2/12)
#1809
Expanding the Hyperbolic Kernels: A Curvature-aware Isometric Embedding View
Modeling data relation as a hierarchical structure has proven beneficial for many learning scenarios, and the hyperbolic space, with negative curvature, can encode such data hierarchy without distortion. Several recent studies also show that the representation power of the hyperbolic space can be further improved by endowing the kernel methods. Unfortunately, the known kernel methods, developed in hyperbolic space, are limited by the adaptation capacity or distortion issues. This paper addresses the issues through a novel embedding function. To this end, we propose a curvature-aware isometric embedding, which establishes an isometry from the Poincar\’e model to a special reproducing kernel Hilbert space (RKHS). Then we can further define a series of kernels on this RKHS, including several positive definite kernels and an indefinite kernel. Thorough experiments are conducted to demonstrate the superiority of our proposals over existing-known hyperbolic and Euclidean kernels in various learning tasks, e.g., graph learning and zero-shot learning.
#3510
Boosting Few-Shot Open-Set Recognition with Multi-Relation Margin Loss
Few-shot open-set recognition (FSOSR) has become a great challenge, which requires classifying known classes and rejecting the unknown ones with only limited samples. Existing FSOSR methods mainly construct an ambiguous distribution of known classes from scarce known samples without considering the latent distribution information of unknowns, which degrades the performance of open-set recognition. To address this issue, we propose a novel loss function called multi-relation margin (MRM) loss that can plug in few-shot methods to boost the performance of FSOSR. MRM enlarges the margin between different classes by extracting the multi-relationship of paired samples to dynamically refine the decision boundary for known classes and implicitly delineate the distribution of unknowns. Specifically, MRM separates the classes by enforcing a margin while concentrating samples of the same class on a hypersphere with a learnable radius. In order to better capture the distribution information of each class, MRM extracts the similarity and correlations among paired samples, ameliorating the optimization of the margin and radius. Experiments on public benchmarks reveal that methods with MRM loss can improve the unknown detection of AUROC by a significant margin while correctly classifying the known classes.
#4043
Label Specific Multi-Semantics Metric Learning for Multi-Label Classification: Global Consideration Helps
In multi-label classification, it is critical to capitalize on complicated data structures and semantic relationships. Metric learning serves as an effective strategy to provide a better measurement of distances between examples. Existing works on metric learning for multi-label classification mainly learn one single global metric that characterizes latent semantic similarity between multi-label instances. However, such single-semantics metric exploitation approaches can not capture the intrinsic properties of multi-label data possessed of rich semantics. In this paper, the first attempt towards multi-semantics metric learning for multi-label classification is investigated. Specifically, the proposed LIMIC approach simultaneously learns one global and multiple label-specific local metrics by exploiting label-specific side information. The global metric is learned to capture the commonality across all the labels and label-specific local metrics characterize the individuality of each semantic space. The combination of global metric and label-specific local metrics is utilized to construct latent semantic space for each label, in which similar intra-class instances are pushed closer and inter-class instances are pulled apart. Furthermore, metric-based label correlation regularization is constructed to maintain similarity between correlated label spaces. Extensive experiments on benchmark multi-label data sets validate the superiority of our proposed approach in learning effective distance metrics for multi-label classification.
#4132
Lifelong Multi-view Spectral Clustering
In recent years, spectral clustering has become a well-known and effective algorithm in machine learning. However, traditional spectral clustering algorithms are designed for single-view data and fixed task setting. This can become a limitation when dealing with new tasks in a sequence, as it requires accessing previously learned tasks. Hence it leads to high storage consumption, especially for multi-view datasets. In this paper, we address this limitation by introducing a lifelong multi-view clustering framework. Our approach uses view-specific knowledge libraries to capture intra-view knowledge across different tasks. Specifically, we propose two types of libraries: an orthogonal basis library that stores cluster centers in consecutive tasks, and a feature embedding library that embeds feature relations shared among correlated tasks. When a new clustering task is coming, the knowledge is iteratively transferred from libraries to encode the new task, and knowledge libraries are updated according to the online update formulation. Meanwhile, basis libraries of different views are further fused into a consensus library with adaptive weights. Experimental results show that our proposed method outperforms other competitive clustering methods on multi-view datasets by a large margin.
#1572
Overlooked Implications of the Reconstruction Loss for VAE Disentanglement
Learning disentangled representations with variational autoencoders (VAEs) is often attributed to the regularisation component of the loss. In this work, we highlight the interaction between data and the reconstruction term of the loss as the main contributor to disentanglement in VAEs. We show that standard benchmark datasets have unintended correlations between their subjective ground-truth factors and perceived axes in the data according to typical VAE reconstruction losses. Our work exploits this relationship to provide a theory for what constitutes an adversarial dataset under a given reconstruction loss. We verify this by constructing an example dataset that prevents disentanglement in state-of-the-art frameworks while maintaining human-intuitive ground-truth factors. Finally, we re-enable disentanglement by designing an example reconstruction loss that is once again able to perceive the ground-truth factors. Our findings demonstrate the subjective nature of disentanglement and the importance of considering the interaction between the ground-truth factors, data and notably, the reconstruction loss, which is under-recognised in the literature.
#4853
HOUDINI: Escaping from Moderately Constrained Saddles
We give polynomial time algorithms for escaping from high-dimensional saddle points under a moderate number of constraints. Given gradient access to a smooth function, we show that (noisy) gradient descent methods can escape from saddle points under a logarithmic number of inequality constraints. While analogous results exist for unconstrained and equality-constrained problems, we make progress on the major open question of convergence to second-order stationary points in the case of inequality constraints, without reliance on NP-oracles or altering the definitions to only account for certain constraints. Our results hold for both regular and stochastic gradient descent.
#3655
Open-world Semi-supervised Novel Class Discovery
Traditional semi-supervised learning tasks assume that both labeled and unlabeled data follow the same class distribution, but the realistic open-world scenarios are of more complexity with unknown novel classes mixed in the unlabeled set. Therefore, it is of great challenge to not only recognize samples from known classes but also discover the unknown number of novel classes within the unlabeled data. In this paper, we introduce a new open-world semi-supervised novel class discovery approach named OpenNCD, a progressive bi-level contrastive learning method over multiple prototypes. The proposed method is composed of two reciprocally enhanced parts. First, a bi-level contrastive learning method is introduced, which maintains the pair-wise similarity of the prototypes and the prototype group levels for better representation learning. Then, a reliable prototype similarity metric is proposed based on the common representing instances. Prototypes with high similarities will be grouped progressively for known class recognition and novel class discovery. Extensive experiments on three image datasets are conducted and the results show the effectiveness of the proposed method in open-world scenarios, especially with scarce known classes and labels.
#4339
GIDnets: Generative Neural Networks for Solving Inverse Design Problems via Latent Space Exploration
In a number of different fields, including Engeneering, Chemistry and Physics, the design of technological tools and device structures is increasingly supported by deep-learning based methods, which provide suggestions on crucial architectural choices based on the properties that these tools and structures should exhibit. The paper proposes a novel architecture, named GIDnet, to address this inverse design problem, which is based on exploring a suitably defined latent space associated with the possible designs. Among its distinguishing features, GIDnet is capable of identifying the most appropriate starting point for the exploration and of likely converging into a point corresponding to a design that is a feasible one. Results of a thorough experimental activity evidence that GIDnet outperforms earlier approaches in the literature.
Tuesday 22nd August
15:30-16:50
ML: Reinforcement Learning
#974
DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards
Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness is a deciding factor in the performance of RL algorithms, especially when facing sparse extrinsic rewards. Recent studies have shown the effectiveness of encouraging exploration with intrinsic rewards estimated from novelties in observations. However, there is a gap between the novelty of an observation and an exploration, as both the stochasticity in the environment and the agent’s behavior may affect the observation. To evaluate exploratory behaviors accurately, we propose DEIR, a novel method in which we theoretically derive an intrinsic reward with a conditional mutual information term that principally scales with the novelty contributed by agent explorations, and then implement the reward with a discriminative forward model. Extensive experiments on both standard and advanced exploration tasks in MiniGrid show that DEIR quickly learns a better policy than the baselines. Our evaluations on ProcGen demonstrate both the generalization capability and the general applicability of our intrinsic reward.
#1236
Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees
Linear Temporal Logic (LTL) is widely used to specify high-level objectives for system policies, and it is highly desirable for autonomous systems to learn the optimal policy with respect to such specifications. However, learning the optimal policy from LTL specifications is not trivial. We present a model-free Reinforcement Learning (RL) approach that efficiently learns an optimal policy for an unknown stochastic system, modelled using Markov Decision Processes (MDPs). We propose a novel and more general product MDP, reward structure and discounting mechanism that, when applied in conjunction with off-the-shelf model-free RL algorithms, efficiently learn the optimal policy that maximizes the probability of satisfying a given LTL specification with optimality guarantees. We also provide improved theoretical results on choosing the key parameters in RL to ensure optimality. To directly evaluate the learned policy, we adopt probabilistic model checker PRISM to compute the probability of the policy satisfying such specifications. Several experiments on various tabular MDP environments across different LTL tasks demonstrate the improved sample efficiency and optimal policy convergence.
#1426
Contrastive Learning and Reward Smoothing for Deep Portfolio Management
In this study, we used reinforcement learning (RL) models to invest assets in order to earn returns. The models were trained to interact with a simulated environment based on historical market data and learn trading strategies. However, using deep neural networks based on the returns of each period can be challenging due to the unpredictability of financial markets. As a result, the policies learned from training data may not be effective when tested in real-world situations. To address this issue, we incorporated contrastive learning and reward smoothing into our training process. Contrastive learning allows the RL models to recognize patterns in asset states that may indicate future price movements. Reward smoothing, on the other hand, serves as a regularization technique to prevent the models from seeking immediate but uncertain profits. We tested our method against various traditional financial techniques and other deep RL methods, and found it to be effective in both the U.S. stock market and the cryptocurrency market. Our source code is available at https://github.com/sophialien/FinTech-DPM.
#J5921
Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning (Extended Abstract)
Keeping risk under control is often more crucial than maximizing expected rewards in real-world decision-making situations, such as finance, robotics, autonomous driving, etc. The most natural choice of risk measures is variance, while it penalizes the upside volatility as much as the downside part. Instead, the (downside) semivariance, which captures negative deviation of a random variable under its mean, is more suitable for risk-averse proposes. This paper aims at optimizing the mean-semivariance (MSV) criterion in reinforcement learning w.r.t. steady reward distribution. Since semivariance is time-inconsistent and does not satisfy the standard Bellman equation, the traditional dynamic programming methods are inapplicable to MSV problems directly. To tackle this challenge, we resort to Perturbation Analysis (PA) theory and establish the performance difference formula for MSV. We reveal that the MSV problem can be solved by iteratively solving a sequence of RL problems with a policy-dependent reward function. Further, we propose two on-policy algorithms based on the policy gradient theory and the trust region method. Finally, we conduct diverse experiments from simple bandit problems to continuous control tasks in MuJoCo, which demonstrate the effectiveness of our proposed methods.
#1078
Hierarchical State Abstraction based on Structural Information Principles
State abstraction optimizes decision-making by ignoring irrelevant environmental information in reinforcement learning with rich observations. Nevertheless, recent approaches focus on adequate representational capacities resulting in essential information loss, affecting their performances on challenging tasks. In this article, we propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective. Specifically, an unsupervised, adaptive hierarchical state clustering method without requiring manual assistance is presented, and meanwhile, an optimal encoding tree is generated. On each non-root tree node, a new aggregation function and condition structural entropy are designed to achieve hierarchical state abstraction and compensate for sampling-induced essential information loss in state abstraction. Empirical evaluations on a visual gridworld domain and six continuous control benchmarks demonstrate that, compared with five SOTA state abstraction approaches, SISA significantly improves mean episode reward and sample efficiency up to 18.98 and 44.44%, respectively. Besides, we experimentally show that SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.
#1728
Adaptive Reward Shifting Based on Behavior Proximity for Offline Reinforcement Learning
One of the major challenges of the current offline reinforcement learning research is to deal with the distribution shift problem due to the change in state-action visitations for the new policy. To address this issue, we present a novel reward shifting-based method. Specifically, to regularize the behavior of the new policy at each state, we modify the reward to be received by the new policy by shifting it adaptively according to its proximity to the behavior policy, and apply the reward shifting along opposite directions for in-distribution actions and the ones not. In this way we are able to guide the learning procedure of the new policy itself by influencing the consequence of its actions explicitly, helping it to achieve a better balance between behavior constraints and policy improvement. Empirical results on the popular D4RL benchmarks show that the proposed method obtains competitive performance compared to the state-of-art baselines.
#1272
Scaling Goal-based Exploration via Pruning Proto-goals
One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short. Goal-directed, purposeful behaviours are able to overcome this, but rely on a good goal space. The core challenge in goal discovery is finding the right balance between generality (not hand-crafted) and tractability (useful, not too many). Our approach explicitly seeks the middle ground, enabling the human designer to specify a vast but meaningful proto-goal space, and an autonomous discovery process to refine this to a narrower space of controllable, reachable, novel, and relevant goals. The effectiveness of goal-conditioned exploration with the latter is then demonstrated in three challenging environments.
Tuesday 22nd August
15:30-16:50
Planning and Scheduling (1/3)
#204
A Rigorous Risk-aware Linear Approach to Extended Markov Ratio Decision Processes with Embedded Learning
We consider the problem of risk-aware Markov Decision Processes (MDPs) for Safe AI. We introduce a theoretical framework, Extended Markov Ratio Decision Processes (EMRDP), that incorporates risk into MDPs and embeds environment learning into this framework. We propose an algorithm to find the optimal policy for EMRDP with theoretical guarantees. Under a certain monotonicity assumption, this algorithm runs in strongly-polynomial time both in the discounted and expected average reward models. We validate our algorithm empirically on a Grid World benchmark, evaluating its solution quality, required number of steps, and numerical stability. We find its solution quality to be stable under data noising, while its required number of steps grows with added noise. We observe its numerical stability compared to global methods.
#5098
Formal Explanations of Neural Network Policies for Planning
Deep learning is increasingly used to learn policies for planning problems. However, policies represented by neural networks are difficult to interpret, verify and trust. Existing formal approaches to post-hoc explanations provide concise reasons for a single decision made by an ML model. However, understanding planning policies require explaining sequences of decisions. In this paper, we formulate the problem of finding explanations for the sequence of decisions recommended by a learnt policy in a given state. We show that, under certain assumptions, a minimal explanation for a sequence can be computed by solving a number of single decision explanation problems which is linear in the length of the sequence. We present experimental results of our implementation of this approach for ASNet policies for classical planning domains.
#5308
Can I Really Do That? Verification of Meta-Operators via Stackelberg Planning
Macro-operators are a common reformulation method in planning that adds high-level operators corresponding to a fixed sequence of primitive operators. We introduce meta-operators, which allow using different sequences of actions in each state. We show how to automatically verify whether a meta-operator is valid, i.e., the represented behavior is always doable. This can be checked at once for all instantiations of the meta-operator and all reachable states via a compilation into Stackelberg planning, a form of adversarial planning. Our results show that meta-operators learned for multiple domains can often express useful high-level behaviors very compactly, improving planners’ performance.
#4248
Topological Planning with Post-unique and Unary Actions
We are interested in realistic planning problems to model the behavior of Non-Playable Characters (NPCs) in video games. Search-based action planning, introduced by the game F.E.A.R. in 2005, has an exponential time complexity allowing to control only a dozen NPCs between two frames. A close study of the plans generated in first-person shooters shows that: (1) actions are unary, (2) actions are contextually post-unique and (3) there is no two instances of the same action in an NPC’s plan. By considering (1), (2) and (3) as restrictions, we introduce new classes of problems with the Simplified Action Structure formalism which indeed allow to model realistic problems and whose instances are solvable by a linear-time algorithm. We also experimentally show that our algorithm is capable of managing millions of NPCs per frame.
#4071
Generalization through Diversity: Improving Unsupervised Environment Design
Agent decision making using Reinforcement Learning (RL) heavily relies on either a model or simulator of the environment (e.g., moving in an 8×8 maze with three rooms, playing Chess on an 8×8 board). Due to this dependence, small changes in the environment (e.g., positions of obstacles in the maze, size of the board) can severely affect the effectiveness of the policy learned by the agent. To that end, existing work has proposed training RL agents on an adaptive curriculum of environments (generated automatically) to improve performance on out-of-distribution (OOD) test scenarios. Specifically, existing research has employed the potential for the agent to learn in an environment (captured using Generalized Advantage Estimation, GAE) as the key factor to select the next environment(s) to train the agent. However, such a mechanism can select similar environments (with a high potential to learn) thereby making agent training redundant on all but one of those environments. To that end, we provide a principled approach to adaptively identify diverse environments based on a novel distance measure relevant to environment design. We empirically demonstrate the versatility and effectiveness of our method in comparison to multiple leading approaches for unsupervised environment design on three distinct benchmark problems used in literature.
#4152
Model Predictive Control with Reach-avoid Analysis
In this paper we investigate the optimal controller synthesis problem, so that the system under the controller can reach a specified target set while satisfying given constraints. Existing model predictive control (MPC) methods learn from a set of discrete states visited by previous (sub-)optimized trajectories and thus result in computationally expensive mixed-integer nonlinear optimization. In this paper a novel MPC method is proposed based on reach-avoid analysis to solve the controller synthesis problem iteratively. The reach-avoid analysis is concerned with computing a reach-avoid set which is a set of initial states such that the system can reach the target set successfully. It not only provides terminal constraints, which ensure feasibility of MPC, but also expands discrete states in existing methods into a continuous set (i.e., reach-avoid sets) and thus leads to nonlinear optimization which is more computationally tractable online due to the absence of integer variables. Finally, we evaluate the proposed method and make comparisons with state-of-the-art ones based on several examples.
#3034
Minimizing Reachability Times on Temporal Graphs via Shifting Labels
We study how we can accelerate the spreading of information in temporal graphs via shifting operations; a problem that captures real-world applications varying from information flows to distribution schedules. In a temporal graph there is a set of fixed vertices and the available connections between them change over time in a predefined manner. We observe that, in some cases, shifting some connections, i.e., advancing or delaying them, can decrease the time required to reach from some vertex (source) to another vertex. We study how we can minimize the maximum time a set of sources needs to reach every vertex, when we are allowed to shift some of the connections. If we restrict the allowed number of changes, we prove that, already for a single source, the problem is NP-hard, and W[2]-hard when parameterized by the number of changes. Then we focus on unconstrained number of changes. We derive a polynomial-time algorithm when there is one source. When there are two sources, we show that the problem becomes NP-hard; on the other hand, we design an FPT algorithm parameterized by the treewidth of the graph plus the lifetime of the optimal solution, that works for any number of sources. Finally, we provide polynomial-time algorithms for several graph classes.
#4206
In Which Graph Structures Can We Efficiently Find Temporally Disjoint Paths and Walks?
A temporal graph has an edge set that may change over discrete time steps, and a temporal path (or walk) must traverse edges that appear at increasing time steps. Accordingly, two temporal paths (or walks) are temporally disjoint if they do not visit any vertex at the same time. The study of the computational complexity of finding temporally disjoint paths or walks in temporal graphs has recently been initiated by Klobas et al.. This problem is motivated by applications in multi-agent path finding (MAPF), which include robotics, warehouse management, aircraft management, and traffic routing.
We extend Klobas et al.’s research by providing parameterized hardness results for very restricted cases, with a focus on structural parameters of the so-called underlying graph. On the positive side, we identify sufficiently simple cases where we can solve the problem efficiently. Our results reveal some surprising differences between the “path version” and the “walk version” (where vertices may be visited multiple times) of the problem, and answer several open questions posed by Klobas et al.
Tuesday 22nd August
15:30-16:50
CV: Biomedical Image Analysis
#223
Appearance Prompt Vision Transformer for Connectome Reconstruction
Neural connectivity reconstruction aims to understand the function of biological reconstruction and promote basic scientific research. The intricate morphology and densely intertwined branches make it an extremely challenging task. Most previous best-performing methods adopt affinity learning or metric learning. Nevertheless, they either neglect to model explicit voxel semantics caused by implicit optimization or are hysteresis to spatial information. Furthermore, the inherent locality of 3D CNNs limits modeling long-range dependencies, leading to sub-optimal results. In this work, we propose a coherent and unified Appearance Prompt Vision Transformer (APViT) to integrate affinity and metric learning to exploit the complementarity by learning long-range spatial dependencies. The proposed APViT enjoys several merits. First, the extension continuity-aware attention module aims at constructing hierarchical attention customized for neuron extensibility and slice continuity to learn instance voxel semantic context from a global perspective and utilize continuity priors to enhance voxel spatial awareness. Second, the appearance prompt modulator is responsible for leveraging voxel-adaptive appearance knowledge conditioned on affinity rich in spatial information to instruct instance voxel semantics, exploiting the potential of affinity learning to complement metric learning. Extensive experimental results on multiple challenging benchmarks demonstrate that our APViT achieves consistent improvements with huge flexibility under the same post-processing strategy.
#5281
Diagnose Like a Pathologist: Transformer-Enabled Hierarchical Attention-Guided Multiple Instance Learning for Whole Slide Image Classification
Multiple Instance Learning (MIL) and transformers are increasingly popular in histopathology Whole Slide Image (WSI) classification. However, unlike human pathologists who selectively observe specific regions of histopathology tissues under different magnifications, most methods do not incorporate multiple resolutions of the WSIs, hierarchically and attentively, thereby leading to a loss of focus on the WSIs and information from other resolutions. To resolve this issue, we propose a Hierarchical Attention-Guided Multiple Instance Learning framework to fully exploit the WSIs. This framework can dynamically and attentively discover the discriminative regions across multiple resolutions of the WSIs. Within this framework, an Integrated Attention Transformer is proposed to further enhance the performance of the transformer and obtain a more holistic WSI (bag) representation. This transformer consists of multiple Integrated Attention Modules, which is the combination of a transformer layer and an aggregation module that produces a bag representation based on every instance representation in that bag. The experimental results show that our method achieved state-of-the-art performances on multiple datasets, including Camelyon16, TCGA-RCC, TCGA-NSCLC, and an in-house IMGC dataset. The code is available at https://github.com/BearCleverProud/HAG-MIL.
#596
Deep Unfolding Convolutional Dictionary Model for Multi-Contrast MRI Super-resolution and Reconstruction
Magnetic resonance imaging (MRI) tasks often involve multiple contrasts. Recently, numerous deep learning-based multi-contrast MRI super-resolution (SR) and reconstruction methods have been proposed to explore the complementary information from the multi-contrast images. However, these methods either construct parameter-sharing networks or manually design fusion rules, failing to accurately model the correlations between multi-contrast images and lacking certain interpretations. In this paper, we propose a multi-contrast convolutional dictionary (MC-CDic) model under the guidance of the optimization algorithm with a well-designed data fidelity term. Specifically, we bulid an observation model for the multi-contrast MR images to explicitly model the multi-contrast images as common features and unique features. In this way, only the useful information in the reference image can be transferred to the target image, while the inconsistent information will be ignored. We employ the proximal gradient algorithm to optimize the model and unroll the iterative steps into a deep CDic model. Especially, the proximal operators are replaced by learnable ResNet. In addition, multi-scale dictionaries are introduced to further improve the model performance. We test our MC-CDic model on multi-contrast MRI SR and reconstruction tasks. Experimental results demonstrate the superior performance of the proposed MC-CDic model against existing SOTA methods. Code is available at https://github.com/lpcccc-cv/MC-CDic.
#1588
Accurate MRI Reconstruction via Multi-Domain Recurrent Networks
In recent years, deep convolutional neural networks (CNNs) have become dominant in MRI reconstruction from undersampled k-space. However, most existing CNNs methods reconstruct the undersampled images either in the spatial domain or in the frequency domain, and neglecting the correlation between these two domains. This hinders the further reconstruction performance improvement. To tackle this issue, in this work, we propose a new multi-domain recurrent network (MDR-Net) with multi-domain learning (MDL) blocks as its basic units to reconstruct the undersampled MR image progressively. Specifically, the MDL block interactively processes the local spatial features and the global frequency information to facilitate complementary learning, leading to fine-grained features generation. Furthermore, we introduce an effective frequency-based loss to narrow the frequency spectrum gap, compensating for over-smoothness caused by the widely used spatial reconstruction loss. Extensive experiments on public fastMRI datasets demonstrate that our MDR-Net consistently outperforms other competitive methods and is able to provide more details.
#2228
CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation
The hybrid architecture of convolutional neural networks (CNNs) and Transformer are very popular for medical image segmentation. However, it suffers from two challenges. First, although a CNNs branch can capture the local image features using vanilla convolution, it cannot achieve adaptive feature learning. Second, although a Transformer branch can capture the global features, it ignores the channel and cross-dimensional self-attention, resulting in a low segmentation accuracy on complex-content images. To address these challenges, we propose a novel hybrid architecture of convolutional neural networks hand in hand with vision Transformers (CiT-Net) for medical image segmentation. Our network has two advantages. First, we design a dynamic deformable convolution and apply it to the CNNs branch, which overcomes the weak feature extraction ability due to fixed-size convolution kernels and the stiff design of sharing kernel parameters among different inputs. Second, we design a shifted-window adaptive complementary attention module and a compact convolutional projection. We apply them to the Transformer branch to learn the cross-dimensional long-term dependency for medical images. Experimental results show that our CiT-Net provides better medical image segmentation results than popular SOTA methods. Besides, our CiT-Net requires lower parameters and less computational costs and does not rely on pre-training. The code is publicly available at https://github.com/SR0920/CiT-Net.
#3709
Sub-Band Based Attention for Robust Polyp Segmentation
This article proposes a novel spectral domain based solution to the challenging polyp segmentation. The main contribution is based on an interesting finding of the significant existence of the middle frequency sub-band during the CNN process. Consequently, a Sub-Band based Attention (SBA) module is proposed, which uniformly adopts either the high or middle sub-bands of the encoder features to boost the decoder features and thus concretely improve the feature discrimination. A strong encoder supplying informative sub-bands is also very important, while we highly value the local-and-global information enriched CNN features. Therefore, a Transformer Attended Convolution (TAC) module as the main encoder block is introduced. It takes the Transformer features to boost the CNN features with stronger long-range object contexts. The combination of SBA and TAC leads to a novel polyp segmentation framework, SBA-Net. It adopts TAC to effectively obtain encoded features which also input to SBA, so that efficient sub-bands based attention maps can be generated for progressively decoding the bottleneck features. Consequently, SBA-Net can achieve the robust polyp segmentation, as the experimental results demonstrate.
#1510
Dual-view Correlation Hybrid Attention Network for Robust Holistic Mammogram Classification
Mammogram image is important for breast cancer screening, and typically obtained in a dual-view form, i.e., cranio-caudal (CC) and mediolateral oblique (MLO), to provide complementary information for clinical decisions. However, previous methods mostly learn features from the two views independently, which violates the clinical knowledge and ignores the importance of dual-view correlation in the feature learning. In this paper, we propose a dual-view correlation hybrid attention network (DCHA-Net) for robust holistic mammogram classification. Specifically, DCHA-Net is carefully designed to extract and reinvent deep feature maps for the two views, and meanwhile to maximize the underlying correlations between them. A hybrid attention module, consisting of local relation and non-local attention blocks, is proposed to alleviate the spatial misalignment of the paired views in the correlation maximization. A dual-view correlation loss is introduced to maximize the feature similarity between corresponding strip-like regions with equal distance to the chest wall, motivated by the fact that their features represent the same breast tissues, and thus should be highly-correlated with each other. Experimental results on the two public datasets, i.e., INbreast and CBIS-DDSM, demonstrate that the DCHA-Net can well preserve and maximize feature correlations across views, and thus outperforms previous state-of-the-art methods for classifying a whole mammogram as malignant or not.
#307
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning
The performance of existing supervised neuron segmentation methods is highly dependent on the amount of accurate annotations, especially when applied to large scale electron microscope (EM) data. By extracting semantic information from unlabeled data, self-supervised methods can improve the performance of downstream tasks, among which the mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images. However, due to the high degree of structural locality in EM images, as well as the existence of considerable noise, many voxels contain little discriminative information, making MIM pre-training inefficient on the neuron segmentation task. To overcome this challenge, we propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy. Due to the vast exploration space, using single-agent RL for voxel prediction is impractical. Therefore, we treat each input patch as an agent with a shared behavior policy, allowing for multi-agent collaboration. Furthermore, this multi-agent model is able to capture dependencies between voxels, which is beneficial for the downstream segmentation task. Experiments conducted on representative EM datasets demonstrate that our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
Tuesday 22nd August
15:30-16:50
Computer Vision (1/6)
#1529
Image Composition with Depth Registration
Handling occlusions is still a challenging problem for image composition. It always requires the source contents to be completely in front of the target contents or needs manual interventions to adjust occlusions, which is very tedious. Though several methods have suggested exploiting priors or learning techniques for promoting occlusion determination, their potentials are much limited. This paper addresses the challenge by presenting a depth registration method for merging the source contents seamlessly into the 3D space that the target image represents. Thus, the occlusions between the source contents and target contents can be conveniently handled through pixel-wise depth comparisons, allowing the user to more efficiently focus on the designs for image composition. Experimental results show that we can conveniently handle occlusions in image composition and improve efficiency by about 4 times compared to Photoshop.
#2611
RuleMatch: Matching Abstract Rules for Semi-supervised Learning of Human Standard Intelligence Tests
Raven’s Progressive Matrices (RPM), one of the standard intelligence tests in human psychology, has recently emerged as a powerful tool for studying abstract visual reasoning (AVR) abilities in machines. Although existing computational models for RPM problems achieve good performance, they require a large number of labeled training examples for supervised learning. In contrast, humans can efficiently solve unlabeled RPM problems after learning from only a few example questions. Here, we develop a semi-supervised learning (SSL) method, called RuleMatch, to train deep models with a small number of labeled RPM questions along with other unlabeled questions. Moreover, instead of using pixel-level augmentation in object perception tasks, we exploit the nature of RPM problems and augment the data at the level of abstract rules. Specifically, we disrupt the possible rules contained among context images in an RPM question and force the two augmented variants of the same unlabeled sample to obey the same abstract rule and predict a common pseudo label for training. Extensive experiments show that the proposed RuleMatch achieves state-of-the-art performance on two popular RAVEN datasets. Our work makes an important stride in aligning abstract analogical visual reasoning abilities in machines and humans. Our Code is at https://github.com/ZjjConan/AVR-RuleMatch.
#1361
HDFormer: High-order Directed Transformer for 3D Human Pose Estimation
Human pose estimation is a challenging task due to its structured data sequence nature. Existing methods primarily focus on pair-wise interaction of body joints, which is insufficient for scenarios involving overlapping joints and rapidly changing poses. To overcome these issues, we introduce a novel approach, the High-order Directed Transformer (HDFormer), which leverages high-order bone and joint relationships for improved pose estimation. Specifically, HDFormer incorporates both self-attention and high-order attention to formulate a multi-order attention module. This module facilitates first-order “joint$\leftrightarrow$joint”, second-order “bone$\leftrightarrow$joint”, and high-order “hyperbone$\leftrightarrow$joint” interactions, effectively addressing issues in complex and occlusion-heavy situations. In addition, modern CNN techniques are integrated into the transformer-based architecture, balancing the trade-off between performance and efficiency. HDFormer significantly outperforms state-of-the-art (SOTA) models on Human3.6M and MPI-INF-3DHP datasets, requiring only 1/10 of the parameters and significantly lower computational costs. Moreover, HDFormer demonstrates broad real-world applicability, enabling real-time, accurate 3D pose estimation. The source code is in https://github.com/hyer/HDFormer
#5126
Learning Attention from Attention: Efficient Self-Refinement Transformer for Face Super-Resolution
Recently, Transformer-based architecture has been introduced into face super-resolution task due to its advantage in capturing long-range dependencies. However, these approaches tend to integrate global information in a large searching region, which neglect to focus on the most relevant information and induce blurry effect by the irrelevant textures. Some improved methods simply constrain self-attention in a local window to suppress the useless information. But it also limits the capability of recovering high-frequency details when flat areas dominate the local searching window. To improve the above issues, we propose a novel self-refinement mechanism which could adaptively achieve texture-aware reconstruction in a coarse-to-fine procedure. Generally, the primary self-attention is first conducted to reconstruct the coarse-grained textures and detect the fine-grained regions required further compensation. Then, region selection attention is performed to refine the textures on these key regions. Since self-attention considers the channel information on tokens equally, we employ a dual-branch feature integration module to privilege the important channels in feature extraction. Furthermore, we design the wavelet fusion module which integrate shallow-layer structure and deep-layer detailed feature to recover realistic face images in frequency domain. Extensive experiments demonstrate the effectiveness on a variety of datasets.
#3271
IMF: Integrating Matched Features Using Attentive Logit in Knowledge Distillation
Knowledge distillation (KD) is an effective method for transferring the knowledge of a teacher model to a student model, that aims to improve the latter’s performance efficiently. Although generic knowledge distillation methods such as softmax representation distillation and intermediate feature matching have demonstrated improvements with various tasks, only marginal improvements are shown in student networks due to their limited model capacity. In this work, to address the student model’s limitation, we propose a novel flexible KD framework, Integrating Matched Features using Attentive Logit in Knowledge Distillation (IMF). Our approach introduces an intermediate feature distiller (IFD) to improve the overall performance of the student model by directly distilling the teacher’s knowledge into branches of student models. The generated output of IFD, which is trained by the teacher model, is effectively combined by attentive logit. We use only a few blocks of the student and the trained IFD during inference, requiring an equal or less number of parameters. Through extensive experiments, we demonstrate that IMF consistently outperforms other state-of-the-art methods with a large margin over the various datasets in different tasks without extra computation.
#2738
Boosting Decision-Based Black-Box Adversarial Attack with Gradient Priors
Decision-based methods have shown to be effective in black-box adversarial attacks, as they can obtain satisfactory performance and only require to access the final model prediction. Gradient estimation is a critical step in black-box adversarial attacks, as it will directly affect the query efficiency. Recent works have attempted to utilize gradient priors to facilitate score-based methods to obtain better results. However, these gradient priors still suffer from the edge gradient discrepancy issue and the successive iteration gradient direction issue, thus are difficult to simply extend to decision-based methods. In this paper, we propose a novel Decision-based Black-box Attack framework with Gradient Priors (DBA-GP), which seamlessly integrates the data-dependent gradient prior and time-dependent prior into the gradient estimation procedure. First, by leveraging the joint bilateral filter to deal with each random perturbation, DBA-GP can guarantee that the generated perturbations in edge locations are hardly smoothed, i.e., alleviating the edge gradient discrepancy, thus remaining the characteristics of the original image as much as possible. Second, by utilizing a new gradient updating strategy to automatically adjust the successive iteration gradient direction, DBA-GP can accelerate the convergence speed, thus improving the query efficiency. Extensive experiments have demonstrated that the proposed method outperforms other strong baselines significantly.
#251
Guided Patch-Grouping Wavelet Transformer with Spatial Congruence for Ultra-High Resolution Segmentation
Most existing ultra-high resolution (UHR) segmentation methods always struggle in the dilemma of balancing memory cost and local characterization accuracy, which are both taken into account in our proposed Guided Patch-Grouping Wavelet Transformer (GPWFormer) that achieves impressive performances. In this work, GPWFormer is a Transformer (T)-CNN (C) mutual leaning framework, where T takes the whole UHR image as input and harvests both local details and fine-grained long-range contextual dependencies, while C takes downsampled image as input for learning the category-wise deep context. For the sake of high inference speed and low computation complexity, T partitions the original UHR image into patches and groups them dynamically, then learns the low-level local details with the lightweight multi-head Wavelet Transformer (WFormer) network. Meanwhile, the fine-grained long-range contextual dependencies are also captured during this process, since patches that are far away in the spatial domain can also be assigned to the same group. In addition, masks produced by C are utilized to guide the patch grouping process, providing a heuristics decision. Moreover, the congruence constraints between the two branches are also exploited to maintain the spatial consistency among the patches. Overall, we stack the multi-stage process in a pyramid way. Experiments show that GPWFormer outperforms the existing methods with significant improvements on five benchmark datasets.
Tuesday 22nd August
15:30-16:50
Data Mining (2/3)
#4961
Capturing the Long-Distance Dependency in the Control Flow Graph via Structural-Guided Attention for Bug Localization
To alleviate the burden of software maintenance, bug localization, which aims to automatically locate the buggy source files based on the bug report, has drawn significant attention in the software mining community. Recent studies indicate that the program structure in source code carries more semantics reflecting the program behavior, which is beneficial for bug localization. Benefiting from the rich structural information in the Control Flow Graph (CFG), CFG-based bug localization methods have achieved the state-of-the-art performance. Existing CFG-based methods extract the semantic feature from the CFG via the graph neural network. However, the step-wise feature propagation in the graph neural network suffers from the problem of information loss when the propagation distance is long, while the long-distance dependency is rather common in the CFG. In this paper, we argue that the long-distance dependency is crucial for feature extraction from the CFG, and propose a novel bug localization model named sgAttention. In sgAttention, a particularly designed structural-guided attention is employed to globally capture the information in the CFG, where features of irrelevant nodes are masked for each node to facilitate better feature extraction from the CFG. Experimental results on four widely-used open-source software projects indicate that sgAttention averagely improves the state-of-the-art bug localization methods by 32.9\% and 29.2\% and the state-of-the-art pre-trained models by 5.8\% and 4.9\% in terms of MAP and MRR, respectively.
#2051
Continuous-Time Graph Learning for Cascade Popularity Prediction
Information propagation on social networks could be modeled as cascades, and many efforts have been made to predict the future popularity of cascades. However, most of the existing research treats a cascade as an individual sequence. Actually, the cascades might be correlated with each other due to the shared users or similar topics. Moreover, the preferences of users and semantics of a cascade are usually continuously evolving over time. In this paper, we propose a continuous-time graph learning method for cascade popularity prediction, which first connects different cascades via a universal sequence of user-cascade and user-user interactions and then chronologically learns on the sequence by maintaining the dynamic states of users and cascades. Specifically, for each interaction, we present an evolution learning module to continuously update the dynamic states of the related users and cascade based on their currently encoded messages and previous dynamic states. We also devise a cascade representation learning component to embed the temporal information and structural information carried by the cascade. Experiments on real-world datasets demonstrate the superiority and rationality of our approach.
#2241
Uncovering the Largest Community in Social Networks at Scale
The Maximum k-Plex Search (MPS) can find the largest k-plex, which is a generalization of the largest clique.
Although MPS is commonly used in AI to effectively discover real-world communities of social networks, existing MPS algorithms suffer from high computational costs because they iteratively scan numerous nodes to find the largest k-plex.
Here, we present an efficient MPS algorithm called Branch-and-Merge (BnM), which outputs an exact maximum k-plex.
BnM merges unnecessary nodes to explore a smaller graph than the original one.
Extensive evaluations on real-world social networks demonstrate that BnM significantly outperforms other state-of-the-art MPS algorithms in terms of running time.
#5014
Adaptive Path-Memory Network for Temporal Knowledge Graph Reasoning
Temporal knowledge graph (TKG) reasoning aims to predict the future missing facts based on historical information and has gained increasing research interest recently. Lots of works have been made to model the historical structural and temporal characteristics for the reasoning task. Most existing works model the graph structure mainly depending on entity representation. However, the magnitude of TKG entities in real-world scenarios is considerable, and an increasing number of new entities will arise as time goes on. Therefore, we propose a novel architecture modeling with relation feature of TKG, namely aDAptivE path-MemOry Network (DaeMon), which adaptively models the temporal path information between query subject and each object candidate across history time. It models the historical information without depending on entity representation. Specifically, DaeMon uses path memory to record the temporal path information derived from path aggregation unit across timeline considering the memory passing strategy between adjacent timestamps. Extensive experiments conducted on four real-world TKG datasets demonstrate that our proposed model obtains substantial performance improvement and outperforms the state-of-the-art up to 4.8% absolute in MRR.
#4973
Exploiting Non-Interactive Exercises in Cognitive Diagnosis
Cognitive Diagnosis aims to quantify the proficiency level of students on specific knowledge concepts. Existing studies merely leverage observed historical students-exercise interaction logs to access proficiency levels. Despite effectiveness, observed interactions usually exhibit a power-law distribution, where the long tail consisting of students with few records lacks supervision signals. This phenomenon leads to inferior diagnosis among few records students. In this paper, we propose the Exercise-aware Informative Response Sampling (EIRS) framework to address the long-tail problem. EIRS is a general framework that explores the partial order between observed and unobserved responses as auxiliary ranking-based training signals to supplement cognitive diagnosis. Considering the abundance and complexity of unobserved responses, we first design an Exercise-aware Candidates Selection module, which helps our framework produce reliable potential responses for effective supplementary training. Then, we develop an Expected Ability Change-weighted Informative Sampling strategy to adaptively sample informative potential responses that contribute greatly to model training. Experiments on real-world datasets demonstrate the supremacy of our framework in long-tailed data.
#1833
Probabilistic Masked Attention Networks for Explainable Sequential Recommendation
Transformer-based models are powerful for modeling temporal dynamics of user preference in sequential recommendation. Most of the variants adopt the Softmax transformation in the self-attention layers to generate dense attention probabilities. However, real-world item sequences are often noisy, containing a mixture of true-positive and false-positive interactions. Such dense attentions inevitably assign probability mass to noisy or irrelevant items, leading to sub-optimal performance and poor explainability. Here we propose a Probabilistic Masked Attention Network (PMAN) to identify the sparse pattern of attentions, which is more desirable for pruning noisy items in sequential recommendation. Specifically, we employ a probabilistic mask to achieve sparse attentions under a constrained optimization framework. As such, PMAN allows to select which information is critical to be retained or dropped in a data-driven fashion. Experimental studies on real-world benchmark datasets show that PMAN is able to improve the performance of Transformers significantly.
#4229
Targeting Minimal Rare Itemsets from Transaction Databases
The computation of minimal rare itemsets is a well known task in data mining, with numerous applications, e.g., drugs effects analysis and network security, among others. This paper presents a novel approach to the computation of minimal rare itemsets. First, we introduce a generalization of the traditional minimal rare itemset model called k-minimal rare itemset. A k-minimal rare itemset is defined as an itemset that becomes frequent or rare based on the removal of at least k or at most (k − 1) items from it. We claim that our work is the first to propose this generalization in the field of data mining. We then present a SAT-based framework for efficiently discovering k-minimal rare itemsets from large transaction databases. Afterwards, by partitioning the k-minimal rare itemset mining problem into smaller sub-problems, we aim to make it more manageable and easier to solve. Finally, to evaluate the effectiveness and efficiency of our approach, we conduct extensive experimental analysis using various popular datasets. We compare our method with existing specialized algorithms and CP-based algorithms commonly used for this task.
#SC5
Learning Causal Effects on Hypergraphs (Extended Abstract)
Hypergraphs provide an effective abstraction for modeling multi-way group interactions among nodes, where each hyperedge can connect any number of nodes. Different from most existing studies which leverage statistical dependencies, we study hypergraphs from the perspective of causality. Specifically, we focus on the problem of individual treatment effect (ITE) estimation on hypergraphs, aiming to estimate how much an intervention (e.g., wearing face covering) would causally affect an outcome (e.g., COVID-19 infection) of each individual node. Existing works on ITE estimation either assume that the outcome of one individual should not be influenced by the treatment of other individuals (i.e., no interference), or assume the interference only exists between connected individuals in an ordinary graph. We argue that these assumptions can be unrealistic on real-world hypergraphs, where higher-order interference can affect the ITE estimations due to group interactions. We investigate high-order interference modeling, and propose a new causality learning framework powered by hypergraph neural networks. Extensive experiments on real-world hypergraphs verify the superiority of our framework over existing baselines.
Tuesday 22nd August
15:30-16:50
Multidisciplinary Topics and Applications (2/4)
#2877
SemiGNN-PPI: Self-Ensembling Multi-Graph Neural Network for Efficient and Generalizable Protein–Protein Interaction Prediction
Protein-protein interactions (PPIs) are crucial in various biological processes and their study has significant implications for drug development and disease diagnosis. Existing deep learning methods suffer from significant performance degradation under complex real-world scenarios due to various factors, e.g., label scarcity and domain shift. In this paper, we propose a self-ensembling multi-graph neural network (SemiGNN-PPI) that can effectively predict PPIs while being both efficient and generalizable. In SemiGNN-PPI, we not only model the protein correlations but explore the label dependencies by constructing and processing multiple graphs from the perspectives of both features and labels in the graph learning process. We further marry GNN with Mean Teacher to effectively leverage unlabeled graph-structured PPI data for self-ensemble graph learning. We also design multiple graph consistency constraints to align the student and teacher graphs in the feature embedding space, enabling the student model to better learn from the teacher model by incorporating more relationships. Extensive experiments on PPI datasets of different scales with different evaluation settings demonstrate that SemiGNN-PPI outperforms state-of-the-art PPI prediction methods, particularly in challenging scenarios such as training with limited annotations and testing on unseen data.
#1412
Towards Generalizable Reinforcement Learning for Trade Execution
Optimized trade execution is to sell (or buy) a given amount of assets in a given time with the lowest possible trading cost. Recently, reinforcement learning (RL) has been applied to optimized trade execution to learn smarter policies from market data. However, we find that many existing RL methods exhibit considerable overfitting which prevents them from real deployment. In this paper, we provide an extensive study on the overfitting problem in optimized trade execution. First, we model the optimized trade execution as offline RL with dynamic context (ORDC), where the context represents market variables that cannot be influenced by the trading policy and are collected in an offline manner. Under this framework, we derive the generalization bound and find that the overfitting issue is caused by large context space and limited context samples in the offline setting. Accordingly, we propose to learn compact representations for context to address the overfitting problem, either by leveraging prior knowledge or in an end-to-end manner. To evaluate our algorithms, we also implement a carefully designed simulator based on historical limit order book (LOB) data to provide a high-fidelity benchmark for different algorithms. Our experiments on the high-fidelity simulator demonstrate that our algorithms can effectively alleviate overfitting and achieve better performance.
#58
HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE
Factor model is a fundamental investment tool in quantitative investment, which can be empowered by deep learning to become more flexible and efficient in practical complicated investing situations. However, it is still an open question to build a factor model that can conduct stock prediction in an online and adaptive setting, where the model can adapt itself to match the current market regime identified based on only point-in-time market information. To tackle this problem, we propose the first deep learning based online and adaptive factor model, HireVAE, at the core of which is a hierarchical latent space that embeds the underlying relationship between the market situation and stock-wise latent factors, so that HireVAE can effectively estimate useful latent factors given only historical market information and subsequently predict accurate stock returns. Across four commonly used real stock market benchmarks, the proposed HireVAE demonstrate superior performance in terms of active returns over previous methods, verifying the potential of such online and adaptive factor model.
#3526
Deep Hashing-based Dynamic Stock Correlation Estimation via Normalizing Flow
In financial scenarios, influenced by common factors such as global macroeconomic and sector-specific factors, stocks exhibit varying degrees of correlations with each other, which is essential in risk-averse portfolio allocation. Because the real risk matrix is unobservable, the covariance-based correlation matrix is widely used for constructing diversified stock portfolios. However, studies have seldom focused on dynamic correlation matrix estimation under the non-stationary financial market. Moreover, as the number of stocks in the market grows, existing correlation matrix estimation methods face more serious challenges with regard to efficiency and effectiveness. In this paper, we propose a novel hash-based dynamic correlation forecasting model (HDCF) to estimate dynamic stock correlations. Under structural assumptions on the correlation matrix, HDCF learns the hash representation based on normalizing flows instead of the real-valued representation, which performs extremely efficiently in high-dimensional settings. Experiments show that our proposed model outperforms baselines on portfolio decisions in terms of effectiveness and efficiency.
#2693
MolHF: A Hierarchical Normalizing Flow for Molecular Graph Generation
Molecular de novo design is a critical yet challenging task in scientific fields, aiming to design novel molecular structures with desired property profiles. Significant progress has been made by resorting to generative models for graphs. However, limited attention is paid to hierarchical generative models, which can exploit the inherent hierarchical structure (with rich semantic information) of the molecular graphs and generate complex molecules of larger size that we shall demonstrate to be difficult for most existing models. The primary challenge to hierarchical generation is the non-differentiable issue caused by the generation of intermediate discrete coarsened graph structures. To sidestep this issue, we cast the tricky hierarchical generation problem over discrete spaces as the reverse process of hierarchical representation learning and propose MolHF, a new hierarchical flow-based model that generates molecular graphs in a coarse-to-fine manner. Specifically, MolHF first generates bonds through a multi-scale architecture, then generates atoms based on the coarsened graph structure at each scale. We demonstrate that MolHF achieves state-of-the-art performance in random generation and property optimization, implying its high capacity to model data distribution. Furthermore, MolHF is the first flow-based model that can be applied to model larger molecules (polymer) with more than 100 heavy atoms. The code and models are available at https://github.com/violet-sto/MolHF.
#4097
Transferable Curricula through Difficulty Conditioned Generators
Advancements in reinforcement learning (RL) have demonstrated superhuman performance in complex tasks such as Starcraft, Go, Chess etc. However, knowledge transfer from Artificial “Experts” to humans remain a significant challenge. A promising avenue for such transfer would be the use of curricula. Recent methods in curricula generation focuses on training RL agents efficiently, yet such methods rely on surrogate measures to track student progress, and are not suited for training robots in the real world (or more ambitiously humans). In this paper, we introduce a method named Parameterized Environment Response Model (PERM) that shows promising results in training RL agents in parameterized environments. Inspired by Item Response Theory, PERM seeks to model difficulty of environments and ability of RL agents directly. Given that RL agents and humans are trained more efficiently under the “zone of proximal development”, our method generates a curriculum by matching the difficulty of an environment to the current ability of the student. In addition, PERM can be trained offline and does not employ non-stationary measures of student ability, making it suitable for transfer between students. We demonstrate PERM’s ability to represent the environment parameter space, and training with RL agents with PERM produces a strong performance in deterministic environments. Lastly, we show that our method is transferable between students, without any sacrifice in training quality.
#830
InitLight: Initial Model Generation for Traffic Signal Control Using Adversarial Inverse Reinforcement Learning
Due to repetitive trial-and-error style interactions between agents and a fixed traffic environment during the policy learning, existing Reinforcement Learning (RL)-based Traffic Signal Control (TSC) methods greatly suffer from long RL training time and poor adaptability of RL agents to other complex traffic environments. To address these problems, we propose a novel Adversarial Inverse Reinforcement Learning (AIRL)-based pre-training method named InitLight, which enables effective initial model generation for TSC agents. Unlike traditional RL-based TSC approaches that train a large number of agents simultaneously for a specific multi-intersection environment, InitLight pre-trains only one single initial model based on multiple single-intersection environments together with their expert trajectories. Since the reward function learned by InitLight can recover ground-truth TSC rewards for different intersections at optimality, the pre-trained agent can be deployed at intersections of any traffic environments as initial models to accelerate subsequent overall global RL training. Comprehensive experimental results show that, the initial model generated by InitLight can not only significantly accelerate the convergence with much fewer episodes, but also own superior generalization ability to accommodate various kinds of complex traffic environments.
#2607
A Generalized Deep Markov Random Fields Framework for Fake News Detection
Recently, the wanton dissemination of fake news on social media has adversely affected our lives, rendering automatic fake news detection a pressing issue. Current methods are often fully supervised and typically employ deep neural networks (DNN) to learn implicit relevance from labeled data, ignoring explicitly shared properties (e.g., inflammatory expressions) across fake news. To address this limitation, we propose a graph-theoretic framework, called Generalized Deep Markov Random Fields Framework (GDMRFF), that inherits the capability of deep learning while at the same time exploiting the correlations among the news articles (including labeled and unlabeled data). Specifically, we first leverage a DNN-based module to learn implicit relations, which we then reveal as the unary function of MRF. Pairwise functions with refining effects to encapsulate human insights are designed to capture the explicit association among all samples. Meanwhile, an event removal module is introduced to remove event impact on pairwise functions. Note that we train GDMRFF with the semi-supervised setting, which decreases the reliance on labeled data while maximizing the potential of unlabeled data. We further develop an Ambiguity Learning Guided MRF (ALGM) model as a concretization of GDMRFF. Experiments show that ALGM outperforms the compared methods significantly on two datasets, especially when labeled data is limited.
Tuesday 22nd August
15:30-16:50
Natural Language Processing (1/4)
#3349
Annealing Genetic-based Preposition Substitution for Text Rubbish Example Generation
Modern Natural Language Processing (NLP) models expose under-sensitivity towards text rubbish examples. The text rubbish example is the heavily modified input text which is nonsensical to humans but does not change the model’s prediction. Prior work crafts rubbish examples by iteratively deleting words and determining the deletion order with beam search. However, the produced rubbish examples usually cause a reduction in model confidence and sometimes deliver human-readable text. To address these problems, we propose an Annealing Genetic based Preposition Substitution (AGPS) algorithm for text rubbish sample generation with two major merits. Firstly, the AGPS crafts rubbish text examples by substituting input words with meaningless prepositions instead of directly removing them, which brings less degradation to the model’s confidence. Secondly, we design an Annealing Genetic algorithm to optimize the word replacement priority, which allows the Genetic Algorithm (GA) to jump out the local optima with probabilities. This is significant in achieving better objectives, i.e., a high word modification rate and a high model confidence. Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences.
#3835
Regularisation for Efficient Softmax Parameter Generation in Low-Resource Text Classifiers
Meta-learning has made tremendous progress in recent years and was demonstrated to be particularly suitable in low-resource settings where training data is very limited. However, meta-learning models still require large amounts of training tasks to achieve good generalisation. Since labelled training data may be sparse, self-supervision-based approaches are able to further improve performance on downstream tasks. Although no labelled data is necessary for this training, a large corpus of unlabelled text needs to be available.
In this paper, we improve on recent advances in meta-learning for natural language models that allow training on a diverse set of training tasks for few-shot, low-resource target tasks. We introduce a way to generate new training data with the need for neither more supervised nor unsupervised datasets. We evaluate the method on a diverse set of NLP tasks and show that the model decreases in performance when trained on this data without further adjustments. Therefore, we introduce and evaluate two methods for regularising the training process and show that they not only improve performance when used in conjunction with the new training data but also improve average performance when training only on the original data, compared to the baseline.
#5176
Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video
Multimodal abstractive summarization for videos (MAS) requires generating a concise textual summary to describe the highlights of a video according to multimodal resources, in our case, the video content and its transcript. Inspired by the success of the large-scale generative pre-trained language model (GPLM) in generating high-quality textual content (e.g., summary), recent MAS methods have proposed to adapt the GPLM to this task by equipping it with the visual information, which is often obtained through a general-purpose visual feature extractor. However, the generally extracted visual features may overlook some summary-worthy visual information, which impedes model performance. In this work, we propose a novel approach to learning the summary-worthy visual representation that facilitates abstractive summarization. Our method exploits the summary-worthy information from both the cross-modal transcript data and the knowledge that distills from the pseudo summary. Extensive experiments on three public multimodal datasets show that our method outperforms all competing baselines. Furthermore, with the advantages of summary-worthy visual information, our model can have a significant improvement on small datasets or even datasets with limited training data.
#4148
ScriptWorld: Text Based Environment for Learning Procedural Knowledge
Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning based agents. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: a text-based environment for teaching agents about real-world daily chores and hence imparting commonsense knowledge. To the best of our knowledge, it is the first interactive text-based gaming framework that consists of daily real-world human activities designed using scripts dataset. We provide gaming environments for 10 daily activities and perform a detailed analysis of the proposed environment. We develop reinforcement learning based baseline models/agents to play the games in ScriptWorld. To understand the role of language models in such environments, we leverage features obtained from pre-trained language models in the RL agents. Our experiments show that prior knowledge obtained from a pre-trained language model helps to solve real-world text-based gaming environments.
#928
PPAT: Progressive Graph Pairwise Attention Network for Event Causality Identification
Event Causality Identification (ECI) aims to identify the causality between a pair of event mentions in a document, which is composed of sentence-level ECI (SECI) and document-level ECI (DECI). Previous work applies various reasoning models to identify the implicit event causality. However, they indiscriminately reason all event causality in the same way, ignoring that most inter-sentence event causality depends on intra-sentence event causality to infer. In this paper, we propose a progressive graph pairwise attention network (PPAT) to consider the above dependence. PPAT applies a progressive reasoning strategy, as it first predicts the intra-sentence event causality, and then infers the more implicit inter-sentence event causality based on the SECI result. We construct a sentence boundary event relational graph, and PPAT leverages a simple pairwise attention mechanism, which attends to different reasoning chains on the graph. In addition, we propose a causality-guided training strategy for assisting PPAT in learning causality-related representations on every layer. Extensive experiments show that our model achieves state-of-the-art performance on three benchmark datasets (5.5%, 2.2% and 4.5% F1 gains on EventStoryLine, MAVEN-ERE and Causal-TimeBank). Code is available at https://github.com/HITsz-TMG/PPAT.
#2562
Explainable Text Classification via Attentive and Targeted Mixing Data Augmentation
Mixing data augmentation methods have been widely used in text classification recently. However, existing methods do not control the quality of augmented data and have low model explainability. To tackle these issues, this paper proposes an explainable text classification solution based on attentive and targeted mixing data augmentation, ATMIX. Instead of selecting data for augmentation without control, ATMIX focuses on the misclassified training samples as the target for augmentation to better improve the model’s capability. Meanwhile, to generate meaningful augmented samples, it adopts a self-attention mechanism to understand the importance of the subsentences in a text, and cut and mix the subsentences between the misclassified and correctly classified samples wisely. Furthermore, it employs a novel dynamic augmented data selection framework based on the loss function gradient to dynamically optimize the augmented samples for model training. In the end, we develop a new model explainability evaluation method based on subsentence attention and conduct extensive evaluations over multiple real-world text datasets. The results indicate that ATMIX is more effective with higher explainability than the typical classification models, hidden-level, and input-level mixup models.
#SV5654
Recent Advances in Direct Speech-to-text Translation
Recently, speech-to-text translation has attracted more and more attention and many studies have emerged rapidly. In this paper, we present a comprehensive survey on direct speech translation aiming to summarize the current state-of-the-art techniques. First, we categorize the existing research work into three directions based on the main challenges — modeling burden, data scarcity, and application issues. To tackle the problem of modeling burden, two main structures have been proposed, encoder-decoder framework (Transformer and the variants) and multitask frameworks. For the challenge of data scarcity, recent work resorts to many sophisticated techniques, such as data augmentation, pre-training, knowledge distillation, and multilingual modeling. We analyze and summarize the application issues, which include real-time, segmentation, named entity, gender bias, and code-switching. Finally, we discuss some promising directions for future work.
#5260
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
Audio-visual speech recognition (AVSR) research has gained a great success recently by improving the noise-robustness of audio-only automatic speech recognition (ASR) with noise-invariant visual information. However, most existing AVSR approaches simply fuse the audio and visual features by concatenation, without explicit interactions to capture the deep correlations between them, which results in sub-optimal multimodal representations for downstream speech recognition task. In this paper, we propose a cross-modal global interaction and local alignment (GILA) approach for AVSR, which captures the deep audio-visual (A-V) correlations from both global and local perspectives. Specifically, we design a global interaction model to capture the A-V complementary relationship on modality level, as well as a local alignment approach to model the A-V temporal consistency on frame level. Such a holistic view of cross-modal correlations enable better multimodal representations for AVSR. Experiments on public benchmarks LRS3 and LRS2 show that our GILA outperforms the supervised learning state-of-the-art. Code is at https://github.com/YUCHEN005/GILA.
Tuesday 22nd August
15:30-16:50
GTEP: Mechanism Design
#53
Non-Obvious Manipulability in Extensive-Form Mechanisms: The Revelation Principle for Single-Parameter Agents
Recent work in algorithmic mechanism design focuses on designing mechanisms for agents with bounded rationality, modifying the constraints that must be satisfied in order to achieve incentive compatibility. Starting with Li’s strengthening of strategyproofness, obvious strategyproofness (OSP) requires truthtelling to be “obvious” over dishonesty, roughly meaning that the worst outcome from truthful actions must be no worse than the best outcome for dishonest ones. A celebrated result for dominant-strategy incentive-compatible mechanisms that allows us to restrict attention to direct mechanisms, known as the revelation principle, does not hold for OSP: the implementation details matter for the obvious incentive properties of the mechanism. Studying agent strategies in real-life mechanisms, Troyan and Morrill introduce a relaxation of strategyproofness known as non-obvious manipulability, which only requires comparing certain extrema of the agents’ utility functions in order for a mechanism to be incentive-compatible. Specifically a mechanism is not obviously manipulable (NOM) if the best and worst outcomes when acting truthfully are no worse than the best and worst outcomes when acting dishonestly. In this work we first extend the cycle monotonicity framework for direct-revelation NOM mechanism design to indirect mechanisms. We then apply this to two settings, single-parameter agents and mechanisms for two agents in which one has a two-value domain, and show that under these models the revelation principle holds: direct mechanisms are just as powerful as indirect ones.
#1756
Delegated Online Search
In a delegation problem, a principal P with commitment power tries to pick one out of n options. Each option is drawn independently from a known distribution. Instead of inspecting the options herself, P delegates the information acquisition to a rational and self-interested agent A. After inspection, A proposes one of the options, and P can accept or reject. In this paper, we study a natural online variant of delegation, in which the agent searches through the options in an online fashion. How can we design algorithms for P that approximate the utility of her best option in hindsight?
We show that P can obtain a Θ(1/n)-approximation and provide more fine-grained bounds independent of n based on two parameters. If the ratio of maximum and minimum utility for A is bounded by a factor α, we obtain an Ω(log log α / log α)-approximation algorithm and show that this is best possible. If P cannot distinguish options with the same value for herself, we show that ratios polynomial in 1/α cannot be avoided. If the utilities of P and A for each option are related by a factor β, we obtain an Ω(1 / log β)-approximation, and O(log log β / log β) is best possible.
#5088
Revenue Maximization Mechanisms for an Uninformed Mediator with Communication Abilities
Consider a market where a seller owns an item for sale and a buyer wants to purchase it. Each player has private information, known as their type. It can be costly and difficult for the players to reach an agreement through direct communication. However, with a mediator as a trusted third party, both players can communicate privately with the mediator without worrying about leaking too much or too little information. The mediator can design and commit to a multi-round communication protocol for both players, in which they update their beliefs about the other player’s type. The mediator cannot force the players to trade but can influence their behaviors by sending messages to them.
We study the problem of designing revenue-maximizing mechanisms for the mediator. We show that the mediator can, without loss of generality, focus on a set of direct and incentive-compatible mechanisms. We then formulate this problem as a mathematical program and provide an optimal solution in closed form under a regularity condition. Our mechanism is simple and has a threshold structure. We also discuss some interesting properties of the optimal mechanism, such as situations where the mediator may lose money.
#1716
Exploring Leximin Principle for Fair Core-Selecting Combinatorial Auctions: Payment Rule Design and Implementation
Core-selecting combinatorial auctions (CAs) restrict the auction result in the core such that no coalitions could improve their utilities by engaging in collusion. The minimum-revenue-core (MRC) rule is a widely used core-selecting payment rule to maximize the total utilities of all bidders. However, the MRC rule can suffer from severe unfairness since it ignores individuals’ utilities. To address this limitation, we propose to explore the leximin principle to achieve fairness in core-selecting CAs since the leximin principle prefers to maximize the utility of the worst-off; the resulting bidder-leximin-optimal (BLO) payment rule is then theoretically analyzed and an effective algorithm is further provided to compute the BLO outcome. Moreover, we conduct extensive experiments to show that our algorithm returns fairer utility distributions and is faster than existing algorithms of core-selecting payment rules.
#2120
Truthful Auctions for Automated Bidding in Online Advertising
Automated bidding, an emerging intelligent decision-making paradigm powered by machine learning, has become popular in online advertising. Advertisers in automated bidding evaluate the cumulative utilities and have private financial constraints over multiple ad auctions in a long-term period. Based on these distinct features, we consider a new ad auction model for automated bidding: the values of advertisers are public while the financial constraints, such as budget and return on investment (ROI) rate, are private types. We derive the truthfulness conditions with respect to private constraints for this multi-dimensional setting, and demonstrate any feasible allocation rule could be equivalently reduced to a series of non-decreasing functions on budget. However, the resulted allocation mapped from these non-decreasing functions generally follows an irregular shape, making it difficult to obtain a closed-form expression for the auction objective. To overcome this design difficulty, we propose a family of truthful automated bidding auction with personalized rank scores, similar to the Generalized Second-Price (GSP) auction. The intuition behind our design is to leverage personalized rank scores as the criteria to allocate items, and compute a critical ROI to transforms the constraints on budget to the same dimension as ROI. The experimental results demonstrate that the proposed auction mechanism outperforms the widely used ad auctions, such as first-price auction and second-price auction, in various automated bidding environments.
#1045
Learning Efficient Truthful Mechanisms for Trading Networks
Trading networks are an indispensable part of today’s economy, but to compete successfully with others, they must be efficient in maximizing the value they provide to the external market. While the prior work relies on truthful disclosure of private information to achieve efficiency, we study the problem of designing mechanisms that result in efficient trading networks by incentivizing firms to truthfully reveal their private information to a third party. Additional desirable properties of such mechanisms are weak budget balance (WBB; the third party needs not invest) and individual rationality (IR; firms get non-negative utility). Unlike combinatorial auctions, there may not exist mechanisms that simultaneously satisfy these properties ex post for trading networks. We propose an approach for computing or learning truthful and efficient mechanisms for given networks in a Bayesian setting, where WBB and IR, respectively, are relaxed to ex ante and interim for a given distribution over the private information. We incorporate techniques to reduce computational and sample complexity. We empirically demonstrate that the proposed approach successfully finds the mechanisms with the relaxed properties for trading networks where achieving ex post properties is impossible.
#1633
Differentiable Economics for Randomized Affine Maximizer Auctions
A recent approach to automated mechanism design, differentiable economics, represents auctions by rich function approximators and optimizes their performance by gradient descent. The ideal auction architecture for differentiable economics would be perfectly strategyproof, support multiple bidders and items, and be rich enough to represent the optimal (i.e. revenue-maximizing) mechanism. So far, such an architecture does not exist. There are single-bidder approaches (MenuNet, RochetNet) which are always strategyproof and can represent optimal mechanisms. RegretNet is multi-bidder and can approximate any mechanism, but is only approximately strategyproof. We present an architecture that supports multiple bidders and is perfectly strategyproof, but cannot necessarily represent the optimal mechanism. This architecture is the classic affine maximizer auction (AMA), modified to offer lotteries. By using the gradient-based optimization tools of differentiable economics, we can now train lottery AMAs, competing with or outperforming prior approaches in revenue.
#SV5488
Game-theoretic Mechanisms for Eliciting Accurate Information
Artificial Intelligence often relies on information obtained from others through crowdsourcing, federated learning, or data markets. It is crucial to ensure that this data is accurate. Over the past 20 years, a variety of incentive mechanisms have been developed that use game theory to reward the accuracy of contributed data. These techniques are applicable to many settings where AI uses contributed data.
This survey categorizes the different techniques and their properties, and shows their limits and tradeoffs. It identifies open issues and points to possible directions to address these.
Tuesday 22nd August
15:30-16:50
AI Ethics, Trust, Fairness (1/3)
#4518
Group Fairness in Set Packing Problems
Kidney exchange programs (KEPs) typically seek to match incompatible patient-donor pairs based on a utilitarian objective where the number or overall quality of transplants is maximized—implicitly penalizing certain classes of difficult to match (e.g., highly-sensitized) patients. Prioritizing the welfare of highly-sensitized (hard-to-match) patients has been studied in several previous works [Roth et al., 2005; McElfresh and Dickerson, 2018; Farnadi et al., 2021] as a natural fairness criterion. We formulate the KEP problem as k-set packing (inspired from the works of [Biro et al., 2009; Lin et al., 2019]) with a probabilistic group fairness notion of proportionality fairness—namely, fair k-set packing (FairSP). In this work we propose algorithms that take arbitrary proportionality vectors (i.e., policy-informed demands of how to prioritize different groups) and return a probabilistically fair solution with provable guarantees. Our main contributions are randomized algorithms as well as hardness results for FairSP variants. Additionally, the tools we introduce serve to audit the price of fairness involved in prioritizing different groups in realistic KEPs and other k-set packing applications. We conclude with experiments on synthetic and realistic kidney exchange FairSP instances.
#4492
Incentivizing Recourse through Auditing in Strategic Classification
The increasing automation of high-stakes decisions with direct impact on the lives and well-being of individuals raises a number of important considerations. Prominent among these is strategic behavior by individuals hoping to achieve a more desirable outcome. Two forms of such behavior are commonly studied: 1) misreporting of individual attributes, and 2) recourse, or actions that truly change such attributes. The former involves deception, and is inherently undesirable, whereas the latter may well be a desirable goal insofar as it changes true individual qualification. We study misreporting and recourse as strategic choices by individuals within a unified framework. In particular, we propose auditing as a means to incentivize recourse actions over attribute manipulation, and characterize optimal audit policies for two types of principals, utility-maximizing and recourse-maximizing. Additionally, we consider subsidies as an incentive for recourse over manipulation, and show that even a utility-maximizing principal would be willing to devote a considerable amount of audit budget to providing such subsidies. Finally, we consider the problem of optimizing fines for failed audits, and bound the total cost incurred by the population as a result of audits.
#SC1
Causal Conceptions of Fairness and their Consequences.
Show Abstract
Hide Abstract
#534
On the Fairness Impacts of Private Ensembles Models
The Private Aggregation of Teacher Ensembles (PATE) is a machine learning framework that enables the creation of private models through the combination of multiple “teacher” models and a “student” model. The student model learns to predict an output based on the voting of the teachers, and the resulting model satisfies differential privacy. PATE has been shown to be effective in creating private models in semi-supervised settings or when protecting data labels is a priority.
This paper explores whether the use of PATE can result in unfairness, and demonstrates that it can lead to accuracy disparities among groups of individuals. The paper also analyzes the algorithmic and data properties that contribute to these disproportionate impacts, why these aspects are affecting different groups disproportionately, and offers recommendations for mitigating these effects.
#1812
Quantifying Harm
In earlier work we defined a qualitative notion of harm: either harm is caused, or it is not. For practical applications, we often need to quantify harm; for example, we may want to choose the least harmful of a set of possible interventions. We first present a quantitative definition of harm in a deterministic context involving a single individual, then we consider the issues involved in dealing with uncertainty regarding the context and going from a notion of harm for a single individual to a notion of “societal harm”, which involves aggregating the harm to individuals. We show that the “obvious” way of doing this (just taking the expected harm for an individual and then summing the expected harm over all individuals) can lead to counterintuitive or inappropriate answers, and discuss alternatives, drawing on work from the decision-theory literature.
#1875
Advancing Post-Hoc Case-Based Explanation with Feature Highlighting
Explainable AI (XAI) has been proposed as a valuable tool to assist in downstream tasks involving human-AI collaboration. Perhaps the most psychologically valid XAI techniques are case-based approaches which display “whole” exemplars to explain the predictions of black-box AI systems. However, for such post-hoc XAI methods dealing with images, there has been no attempt to improve their scope by using multiple clear feature “parts” of the images to explain the predictions while linking back to relevant cases in the training data, thus allowing for more comprehensive explanations that are faithful to the underlying model. Here, we address this gap by proposing two general algorithms (latent and superpixel-based) which can isolate multiple clear feature parts in a test image, and then connect them to the explanatory cases found in the training data, before testing their effectiveness in a carefully designed user study. Results demonstrate that the proposed approach appropriately calibrates a user’s feelings of “correctness” for ambiguous classifications in real world data on the ImageNet dataset, an effect which does not happen when just showing the explanation without feature highlighting.
#3895
Towards Robust Gan-Generated Image Detection: A Multi-View Completion Representation
GAN-generated image detection now becomes the first line of defense against the malicious uses of machine-synthesized image manipulations such as deepfakes. Although some existing detectors work well in detecting clean, known GAN samples, their success is largely attributable to overfitting unstable features such as frequency artifacts, which will cause failures when facing unknown GANs or perturbation attacks. To overcome the issue, we propose a robust detection framework based on a novel multi-view image completion representation. The framework first learns various view-to-image tasks to model the diverse distributions of genuine images. Frequency-irrelevant features can be represented from the distributional discrepancies characterized by the completion models, which are stable, generalized, and robust for detecting unknown fake patterns. Then, a multi-view classification is devised with elaborated intra- and inter-view learning strategies to enhance view-specific feature representation and cross-view feature aggregation, respectively. We evaluated the generalization ability of our framework across six popular GANs at different resolutions and its robustness against a broad range of perturbation attacks. The results confirm our method’s improved effectiveness, generalization, and robustness over various baselines.
Tuesday 22nd August
15:30-16:50
Knowledge Representation and Reasoning (2/4)
#222
Shhh! The Logic of Clandestine Operations
An operation is called covert if it conceals the identity of the actor; it is called clandestine if the very fact that the operation is conducted is concealed. The paper proposes a formal semantics of clandestine operations and introduces a sound and complete logical system that describes the interplay between the distributed knowledge modality and a modality capturing coalition power to conduct clandestine operations.
#4504
Explaining Answer-Set Programs with Abstract Constraint Atoms
Answer-Set Programming (ASP) is a popular declarative reasoning and problem solving formalism. Due to the increasing interest in explainabilty, several explanation approaches have been developed for ASP. However, support for commonly used advanced language features of ASP, as for example aggregates or choice rules, is still mostly lacking. We deal with explaining ASP programs containing Abstract Constraint Atoms, which encompass the above features and others. We provide justifications for the presence, or absence, of an atom in a given answer-set. To this end, we introduce several formal notions of justification in this setting based on the one hand on a semantic characterisation utilising minimal partial models, and on the other hand on a more ruled-guided approach. We provide complexity results for checking and computing such justifications, and discuss how the semantic and syntactic approaches relate and can be jointly used to offer more insight.
Our results contribute to a basis for explaining commonly used language features and thus increase accessibility and usability of ASP as an AI tool.
#3761
Tractable Diversity: Scalable Multiperspective Ontology Management via Standpoint EL
The tractability of the lightweight description logic EL has allowed for the construction of large and widely used ontologies that support semantic interoperability. However, comprehensive domains with a broad user base are often at odds with strong axiomatisations otherwise useful for inferencing, since these are usually context dependent and subject to diverging perspectives.
In this paper we introduce Standpoint EL, a multi-modal extension of EL that allows for the integrated representation of domain knowledge relative to diverse, possibly conflicting standpoints (or contexts), which can be hierarchically organised and put in relation to each other. We establish that Standpoint EL still exhibits EL’s favourable PTime standard reasoning, whereas introducing additional features like empty standpoints, rigid roles, and nominals makes standard reasoning tasks intractable.
#J5940
Incremental Event Calculus for Run-Time Reasoning (Extended Abstract)
We present a system for online, incremental composite event recognition. In streaming environments, the usual case is for data to arrive with a (variable) delay from, and to be revised by, the underlying sources. We propose RTEC_inc, an incremental version of RTEC, a composite event recognition engine with formal, declarative semantics, that has been shown to scale to several real-world data streams. RTEC deals with delayed arrival and revision of events by computing all queries from scratch. This is often inefficient since it results in redundant computations. Instead, RTEC_inc deals with delays and revisions in a more efficient way, by updating only the affected queries. We compare RTEC_inc and RTEC experimentally using real-world and synthetic datasets. The results are compatible with our complexity analysis and show that RTEC_inc outperforms RTEC in many practical cases.
#SV5526
Anti-unification and Generalization: A Survey
Anti-unification (AU) is a fundamental operation for generalization computation used for inductive inference. It is the dual operation to unification, an operation at the foundation of automated theorem proving. Interest in AU from the AI and related communities is growing, but without a systematic study of the concept nor surveys of existing work, investigations often resort to developing application-specific methods that existing approaches may cover. We provide the first survey of AU research and its applications and a general framework for categorizing existing and future developments.
#SC14
Finite Entailment of UCRPQs over ALC Ontologies (Extended Abstract)
We investigate the problem of finite entailment of ontology-mediated queries. We consider the expressive query language, unions of conjunctive regular path queries (UCRPQs), extending the well-known class of unions of conjunctive queries, with regular expressions over roles. We look at ontologies formulated using the description logic ALC, and show a tight 2ExpTime upper bound for finite entailment of UCRPQs.
#4732
On Discovering Interesting Combinatorial Integer Sequences
We study the problem of generating interesting integer sequences with a combinatorial interpretation. For this we introduce a two-step approach. In the first step, we generate first-order logic sentences which define some combinatorial objects, e.g., undirected graphs, permutations, matchings etc. In the second step, we use algorithms for lifted first-order model counting to generate integer sequences that count the objects encoded by the first-order logic formulas generated in the first step. For instance, if the first-order sentence defines permutations then the generated integer sequence is the sequence of factorial numbers n!. We demonstrate that our approach is able to generate interesting new sequences by showing that a non-negligible fraction of the automatically generated sequences can actually be found in the Online Encyclopaedia of Integer Sequences (OEIS) while generating many other similar sequences which are not present in OEIS and which are potentially interesting. A key technical contribution of our work is the method for generation of first-order logic sentences which is able to drastically prune the space of sentences by discarding large fraction of sentences which would lead to redundant integer sequences.
#4313
A Rule-Based Modal View of Causal Reasoning
We present a novel rule-based semantics for causal reasoning as well as a number of modal languages interpreted over it. They enable us to represent some fundamental concepts in the theory of causality including causal necessity and possibility, interventionist conditionals and Lewisian conditionals. We provide complexity results for the satisfiability checking and model checking problem for these modal languages. Moreover, we study the relationship between our rule-based semantics and the structural equation modeling (SEM) approach to causal reasoning, as well as between our rule-based semantics for causal conditionals and the standard semantics for belief base change.
Tuesday 22nd August
15:30-16:50
S: Combinatorial Search and Optimisation
#3155
Exploring Structural Similarity in Fitness Landscapes via Graph Data Mining: A Case Study on Number Partitioning Problems
One of the most common problem-solving heuristics is by analogy. For a given problem, a solver can be viewed as a strategic walk on its fitness landscape. Thus if a solver works for one problem instance, we expect it will also be effective for other instances whose fitness landscapes essentially share structural similarities with each other. However, due to the black-box nature of combinatorial optimization, it is far from trivial to infer such similarity in real-world scenarios. To bridge this gap, by using local optima network as a proxy of fitness landscapes, this paper proposed to leverage graph data mining techniques to conduct qualitative and quantitative analyses to explore the latent topological structural information embedded in those landscapes. In our experiments, we use the number partitioning problem as the case and our empirical results are inspiring to support the overall assumption of the existence of structural similarity between landscapes within neighboring dimensions. Besides, experiments on simulated annealing demonstrate that the performance of a meta-heuristic solver is similar on structurally similar landscapes.
#3834
Sorting and Hypergraph Orientation under Uncertainty with Predictions
Learning-augmented algorithms have been attracting increasing interest, but have only recently been considered in the setting of explorable uncertainty where precise values of uncertain input elements can be obtained by a query and the goal is to minimize the number of queries needed to solve a problem. We study learning-augmented algorithms for sorting and hypergraph orientation under uncertainty, assuming access to untrusted predictions for the uncertain values. Our algorithms provide improved performance guarantees for accurate predictions while maintaining worst-case guarantees that are best possible without predictions. For sorting, our algorithm uses the optimal number of queries for accurate predictions and at most twice the optimal number for arbitrarily wrong predictions. For hypergraph orientation, for any γ≥2, we give an algorithm that uses at most 1+1/γ times the optimal number of queries for accurate predictions and at most γ times the optimal number for arbitrarily wrong predictions. These tradeoffs are the best possible. We also consider different error metrics and show that the performance of our algorithms degrades smoothly with the prediction error in all the cases where this is possible.
#3171
On Optimal Strategies for Wordle and General Guessing Games
The recent popularity of Wordle has revived interest in guessing games. We develop a general method for finding optimal strategies for guessing games while avoiding an exhaustive search. Our main contribution are several theorems that build towards a general theory to prove optimality of a strategy for a guessing game. This work is developed to apply to any guessing game, but we use Wordle as an example to present concrete results.
#3663
Diverse Approximations for Monotone Submodular Maximization Problems with a Matroid Constraint
Finding diverse solutions to optimization problems has been of practical interest for several decades, and recently enjoyed increasing attention in research. While submodular optimization has been rigorously studied in many fields, its diverse solutions extension has not. In this study, we consider the most basic variants of submodular optimization, and propose two simple greedy algorithms, which are known to be effective at maximizing monotone submodular functions. These are equipped with parameters that control the trade-off between objective and diversity. Our theoretical contribution shows their approximation guarantees in both objective value and diversity, as functions of their respective parameters. Our experimental investigation with maximum vertex coverage instances demonstrates their empirical differences in terms of objective-diversity trade-offs.
#1065
An Exact Algorithm for the Minimum Dominating Set Problem
The Minimum Dominating Set (MDS) problem is a classic NP-hard combinatorial optimization problem with many practical applications. Solving MDS is extremely challenging in computation. Previous work on exact algorithms mainly focuses on improving the theoretical time complexity and existing practical algorithms for MDS are almost based on heuristic search. In this paper, we propose a novel lower bound and an exact algorithm for MDS. The algorithm implements a branch-and-bound (BnB) approach and employs the new lower bound to reduce search space. Extensive empirical results show that the new lower bound is efficient in reduction of the search space and the new algorithm is effective for the standard instances and real-world instances. To the best of our knowledge, this is the first effective BnB algorithm for MDS.
#40
PathLAD+: An Improved Exact Algorithm for Subgraph Isomorphism Problem
The subgraph isomorphism problem (SIP) is a challenging problem with wide practical applications. In the last decade, despite being a theoretical hard problem, researchers design various algorithms for solving SIP. In this work, we propose three main heuristics and develop an improved exact algorithm for SIP. First, we design a probing search procedure to try whether the search procedure can successfully obtain a solution at first sight. Second, we design a novel matching ordering as a value-ordering heuristic, which uses some useful information obtained from the probing search procedure to preferentially select some promising target vertices. Third, we discuss the characteristics of different propagation methods in the context of SIP and present an adaptive propagation method to make a good balance between these methods. Experimental results on a broad range of real-world benchmarks show that our proposed algorithm performs better than state-of-the-art algorithms for the SIP.
#1068
A Refined Upper Bound and Inprocessing for the Maximum K-plex Problem
A k-plex of a graph G is an induced subgraph in which every vertex has at most k-1 nonadjacent vertices. The Maximum k-plex Problem (MKP) consists in finding a k-plex of the largest size, which is NP-hard and finds many applications. Existing exact algorithms mainly implement a branch-and-bound approach and improve performance by integrating effective upper bounds and graph reduction rules. In this paper, we propose a refined upper bound, which can derive a tighter upper bound than existing methods, and an inprocessing strategy, which performs graph reduction incrementally. We implement a new BnB algorithm for MKP that employs the two components to reduce the search space. Extensive experiments show that both the refined upper bound and the inprocessing strategy are very efficient in the reduction of search space. The new algorithm outperforms the state-of-the-art algorithms on the tested benchmarks significantly.
Tuesday 22nd August
15:30-16:50
AI for Social Good – ML (1/2)
#AI4SG5431
Customized Positional Encoding to Combine Static and Time-varying Data in Robust Representation Learning for Crop Yield Prediction
Accurate prediction of crop yield under the conditions of climate change is crucial to ensure food security. Transformers have shown remarkable success in modeling sequential data and hold the potential for improving crop yield prediction. To understand how weather and meteorological sequence variables affect crop yield, the positional encoding used in Transformers is typically shared across different sample sequences. We argue that it is necessary and beneficial to differentiate the positional encoding for distinct samples based on time-invariant properties of the sequences. Particularly, the sequence variables influencing crop yield vary according to static variables such as geographical locations. Sample data from southern areas may benefit from more tailored positional encoding different from that for northern areas. We propose a novel transformer based architecture for accurate and robust crop yield prediction, by introducing a Customized Positional Encoding (CPE) that encodes a sequence adaptively according to static information associated with the sequence. Empirical studies demonstrate the effectiveness of the proposed novel architecture and show that partially lin-
earized attention better captures the bias introduced by side information than softmax re-weighting. The resultant crop yield prediction model is robust to climate change, with mean-absolute-error reduced by up to 26% compared to the best baseline model in extreme drought years.
#AI4SG5611
Optimizing Crop Management with Reinforcement Learning and Imitation Learning
Crop management has a significant impact on crop yield, economic profit, and the environment. Although management guidelines exist, finding the optimal management practices is challenging. Previous work used reinforcement learning (RL) and crop simulators to solve the problem, but the trained policies either have limited performance or are not deployable in the real world. In this paper, we present an intelligent crop management system that optimizes nitrogen fertilization and irrigation simultaneously via RL, imitation learning (IL), and crop simulations using the Decision Support System for Agrotechnology Transfer (DSSAT). We first use deep RL, in particular, deep Q-network, to train management policies that require a large number of state variables from the simulator as observations (denoted as full observation). We then invoke IL to train management policies that only need a few state variables that can be easily obtained or measured in the real world (denoted as partial observation) by mimicking the actions of the RL policies trained under full observation. Simulation experiments using the maize crop in Florida (US) and Zaragoza (Spain) demonstrate that the trained policies from both RL and IL techniques achieved more than 45\% improvement in economic profit while causing less environmental impact compared with a baseline method. Most importantly, the IL-trained management policies are directly deployable in the real world as they use readily available information.
#AI4SG5814
Building a Personalized Messaging System for Health Intervention in Underprivileged Regions Using Reinforcement Learning
This work builds an effective AI-based message generation system for diabetes prevention in rural areas, where the diabetes rate has been increasing at an alarming rate. The messages contain information about diabetes causes and complications and the impact of nutrition and fitness on preventing diabetes. We propose to apply reinforcement learning (RL) to optimize our message selection policy over time, tailoring our messages to align with each individual participant’s needs and preferences. We conduct an extensive field study in a large country in Asia which involves more than 1000 participants who are local villagers and they receive messages generated by our system, over a period of six months. Our analysis shows that with the use of AI, we can deliver significant improvements in the participants’ diabetes-related knowledge, physical activity levels, and high-fat food avoidance, when compared to a static message set. Furthermore, we build a new neural network based behavior model to predict behavior changes of participants, trained on data collected during our study. By exploiting underlying characteristics of health-related behavior, we manage to significantly improve the prediction accuracy of our model compared to baselines.
#AI4SG5867
Keeping People Active and Healthy at Home Using a Reinforcement Learning-based Fitness Recommendation Framework
Recent years have seen a rise in smartphone applications promoting health and well being. We argue that there is a large and unexplored ground within the field of recommender systems (RS) for applications that promote good personal health. During the COVID-19 pandemic, with gyms being closed, the demand for at-home fitness apps increased as users wished to maintain their physical and mental health. However, maintaining long-term user engagement with fitness applications has proved a difficult task. Personalisation of the app recommendations that change over time can be a key factor for maintaining high user engagement. In this work we propose a reinforcement learning (RL) based framework for recommending sequences of body-weight exercises to home users over a mobile application interface. The framework employs a user simulator, tuned to feedback a weighted sum of realistic workout rewards, and trains a neural network model to maximise the expected reward over generated exercise sequences. We evaluate our framework within the context of a large 15 week live user trial, showing that an RL based approach leads to a significant increase in user engagement compared to a baseline recommendation algorithm.
#AI4SG1573
Full Scaling Automation for Sustainable Development of Green Data Centers
The rapid rise in cloud computing has resulted in an alarming increase in data centers’ carbon emissions, which now accounts for >3% of global greenhouse gas emissions, necessitating immediate steps to combat their mounting strain on the global climate. An important focus of this effort is to improve resource utilization in order to save electricity usage. Our proposed Full Scaling Automation (FSA) mechanism is an effective method of dynamically adapting resources to accommodate changing workloads in large-scale cloud computing clusters, enabling the clusters in data centers to maintain their desired CPU utilization target and thus improve energy efficiency. FSA harnesses the power of deep representation learning to accurately predict the future workload of each service and automatically stabilize the corresponding target CPU usage level, unlike the previous autoscaling methods, such as Autopilot or FIRM, that need to adjust computing resources with statistical models and expert knowledge. Our approach achieves significant performance improvement compared to the existing work in real-world datasets. We also deployed FSA on large-scale cloud computing clusters in industrial data centers, and according to the certification of the China Environmental United Certification Center (CEC), a reduction of 947 tons of carbon dioxide, equivalent to a saving of 1538,000 kWh of electricity, was achieved during the Double 11 shopping festival of 2022, marking a critical step for our company’s strategic goal towards carbon neutrality by 2030.
#AI4SG5683
Coupled Point Process-based Sequence Modeling for Privacy-preserving Network Alignment
Network alignment aims at finding the correspondence of nodes across different networks, which is significant for many applications, e.g., fraud detection and crime network tracing across platforms.
In practice, however, accessing the topological information of different networks is often restricted and even forbidden, considering privacy and security issues.
Instead, what we observed might be the event sequences of the networks’ nodes in the continuous-time domain.
In this study, we develop a coupled neural point process-based (CPP) sequence modeling strategy, which provides a solution to privacy-preserving network alignment based on the event sequences.
Our CPP consists of a coupled node embedding layer and a neural point process module.
The coupled node embedding layer embeds one network’s nodes and explicitly models the alignment matrix between the two networks.
Accordingly, it parameterizes the node embeddings of the other network by the push-forward operation.
Given the node embeddings, the neural point process module jointly captures the dynamics of the two networks’ event sequences.
We learn the CPP model in a maximum likelihood estimation framework with an inverse optimal transport (IOT) regularizer.
Experiments show that our CPP is compatible with various point process backbones and is robust to the model misspecification issue, which achieves encouraging performance on network alignment.
The code is available at https://github.com/Dixin-s-Lab/CNPP.
Tuesday 22nd August
17:00-18:30
Demos 1
#DM5729
NeoMaPy: A Framework for Computing MAP Inference on Temporal Knowledge Graphs
Markov Logic Networks (MLN) are used for reasoning on uncertain and inconsistent temporal data. We proposed the TMLN (Temporal Markov Logic Network) which extends them with sorts/types, weights on rules and facts, and various temporal consistencies. The NeoMaPy framework integrates it as a knowledge graph based on conflict graphs which offers flexibility for reasoning with parametric Maximum A Posteriori (MAP) inferences, efficiency with an optimistic heuristic and interactive graph visualization for results explanation.
#DM5695
Latent Inspector: An Interactive Tool for Probing Neural Network Behaviors Through Arbitrary Latent Activation
This work presents an active software instrument allowing deep learning architects to interactively inspect neural network models’ output behavior from user-manipulated values in any latent layer. Latent Inspector offers multiple dimension reduction techniques to visualize the model’s high dimensional latent layer output in human-perceptible, two-dimensional plots. The system is implemented with Node.js front end for interactive user input and Python back end for interacting with the model. By utilizing a general and modular architecture, our proposed solution dynamically adapts to a versatile range of models and data structures. Compared to already existing tools, our asynchronous approach of separating the training process from the inspection offers additional possibilities, such as interactive data generation, by actively working with the model instead of visualizing training logs. Overall, Latent Inspector demonstrates the possibilities as well as the appearing limits for providing a generalized, tool-based concept for enhancing model insight in terms of explainable and transparent AI.
#DM5735
Bias On Demand: Investigating Bias with a Synthetic Data Generator
Machine Learning (ML) systems are increasingly being adopted to make decisions that might have a significant impact on people’s lives. Because these decision-making systems rely on data-driven learning, the risk is that they will systematically propagate the bias embedded in the data. To prevent harmful consequences, it is essential to comprehend how and where bias is introduced and possibly how to mitigate it. We demonstrate Bias on Demand, a framework to generate synthetic datasets with different types of bias, which is available as an open-source toolkit and as a pip package. We include a demo of our proposed synthetic data generator, in which we illustrate experiments on different scenarios to showcase the interconnection between biases and their effect on performance and fairness evaluations. We encourage readers to explore the full paper for a more detailed analysis.
#DM5728
SiWare: Contextual Understanding of Industrial Data for Situational Awareness
SiWare is an AI-powered Knowledge Discovery system, that helps unlock new insights and accelerates data-driven decisions with contextualized Industrial data. SiWare links and fuses heterogeneous data sources with an industry semantic model leveraging multiple AI capabilities to provide system-wide visibility into operational characteristics. As part of this demo paper, we describe the requirements for such a system, and deployment aspects, and demonstrate the benefits in two industrial scenarios.
#DM5703
Fedstellar: A Platform for Training Models in a Privacy-preserving and Decentralized Fashion
This paper presents Fedstellar, a platform for training decentralized Federated Learning (FL) models in heterogeneous topologies in terms of the number of federation participants and their connections. Fedstellar allows users to build custom topologies, enabling them to control the aggregation of model parameters in a decentralized manner. The platform offers a Web application for creating, managing, and connecting nodes to ensure data privacy and provides tools to measure, monitor, and analyze the performance of the nodes. The paper describes the functionalities of Fedstellar and its potential applications. To demonstrate the applicability of the platform, different use cases are presented in which decentralized, semi-decentralized, and centralized architectures are compared in terms of model performance, convergence time, and network overhead when collaboratively classifying hand-written digits using the MNIST dataset.
#DM5741
AutoML for Outlier Detection with Optimal Transport Distances
Automated machine learning (AutoML) has been widely researched and adopted for supervised problems, but progress in unsupervised settings has been limited. We propose `”LOTUS”, a novel framework to automate outlier detection based on meta-learning. Our premise is that the selection of the optimal outlier detection technique depends on the inherent properties of the data distribution. We leverage optimal transport to find the dataset with the most similar underlying distribution, and then apply the outlier detection techniques that proved to work best for that data distribution. We evaluate the robustness of our framework and find that it outperforms all state-of-the-art automated outlier detection tools. This approach can also be easily generalized to automate other unsupervised settings.
Wednesday 23rd August
Wednesday 23rd August
10:15-11:15
Machine Learning (3/12)
#84
Adversarial Amendment is the Only Force Capable of Transforming an Enemy into a Friend
Adversarial attack is commonly regarded as a huge threat to neural networks because of misleading behavior. This paper presents an opposite perspective: adversarial attacks can be harnessed to improve neural models if amended correctly. Unlike traditional adversarial defense or adversarial training schemes that aim to improve the adversarial robustness, the proposed adversarial amendment (AdvAmd) method aims to improve the original accuracy level of neural models on benign samples. We thoroughly analyze the distribution mismatch between the benign and adversarial samples. This distribution mismatch and the mutual learning mechanism with the same learning ratio applied in prior art defense strategies is the main cause leading the accuracy degradation for benign samples. The proposed AdvAmd is demonstrated to steadily heal the accuracy degradation and even leads to a certain accuracy boost of common neural models on benign classification, object detection, and segmentation tasks. The efficacy of the AdvAmd is contributed by three key components: mediate samples (to reduce the influence of distribution mismatch with a fine-grained amendment), auxiliary batch norm (to solve the mutual learning mechanism and the smoother judgment surface), and AdvAmd loss (to adjust the learning ratios according to different attack vulnerabilities) through quantitative and ablation experiments.
#449
CTW: Confident Time-Warping for Time-Series Label-Noise Learning
Noisy labels seriously degrade the generalization ability of Deep Neural Networks (DNNs) in various classification tasks. Existing studies on label-noise learning mainly focus on computer vision, while time series also suffer from the same issue. Directly applying the methods from computer vision to time series may reduce the temporal dependency due to different data characteristics. How to make use of the properties of time series to enable DNNs to learn robust representations in the presence of noisy labels has not been fully explored. To this end, this paper proposes a method that expands the distribution of Confident instances by Time-Warping (CTW) to learn robust representations of time series. Specifically, since applying the augmentation method to all data may introduce extra mislabeled data, we select confident instances to implement Time-Warping. In addition, we normalize the distribution of the training loss of each class to eliminate the model’s selection preference for instances of different classes, alleviating the class imbalance caused by sample selection. Extensive experimental results show that CTW achieves state-of-the-art performance on the UCR datasets when dealing with different types of noise. Besides, the t-SNE visualization of our method verifies that augmenting confident data improves the generalization ability. Our code is available at https://github.com/qianlima-lab/CTW.
#1476
Multi-objective Optimization-based Selection for Quality-Diversity by Non-surrounded-dominated Sorting
Quality-Diversity (QD) algorithms, a subset of evolutionary algorithms, maintain an archive (i.e., a set of solutions) and simulate the natural evolution process through iterative selection and reproduction, with the goal of generating a set of high-quality and diverse solutions. Though having found many successful applications in reinforcement learning, QD algorithms often select the parent solutions uniformly at random, which lacks selection pressure and may limit the performance. Recent studies have treated each type of behavior of a solution as an objective, and selected the parent solutions based on Multi-objective Optimization (MO), which is a natural idea, but has not lead to satisfactory performance as expected. This paper gives the reason for the first time, and then proposes a new MO-based selection method by non-surrounded-dominated sorting (NSS), which considers all possible directions of the behaviors, and thus can generate diverse solutions over the whole behavior space. By combining NSS with the most widespread QD algorithm, MAP-Elites, we perform experiments on synthetic functions and several complex tasks (i.e., QDGym, robotic arm, and Mario environment generation), showing that NSS achieves better performance than not only other MO-based selection methods but also state-of-the-art selection methods in QD.
#3535
Distilling Universal and Joint Knowledge for Cross-Domain Model Compression on Time Series Data
For many real-world time series tasks, the computational complexity of prevalent deep leaning models often hinders the deployment on resource limited environments (e.g., smartphones). Moreover, due to the inevitable domain shift between model training (source) and deploying (target) stages, compressing those deep models under cross-domain scenarios becomes more challenging. Although some of existing works have already explored cross-domain knowledge distillation for model compression, they are either biased to source data or heavily tangled between source and target data. To this end, we design a novel end-to-end framework called UNiversal and joInt Knowledge Distillation (UNI-KD) for cross-domain model compression. In particular, we propose to transfer both the universal feature-level knowledge across source and target domains and the joint logit-level knowledge shared by both domains from the teacher to the student model via an adversarial learning scheme. More specifically, a feature-domain discriminator is employed to align teacher’s and student’s representations for universal knowledge transfer. A data-domain discriminator is utilized to prioritize the domain-shared samples for joint knowledge transfer. Extensive experimental results on four time series datasets demonstrate the superiority of our proposed method over state-of-the-art (SOTA) benchmarks. The source code is available at https://github.com/ijcai2023/UNI KD.
#5225
Doubly Stochastic Graph-based Non-autoregressive Reaction Prediction
Organic reaction prediction is an important task in drug discovery. Recently, non-autoregressive reaction prediction has been achieved through modeling redistribution of electrons, reaching state-of-the-art top-1 accuracy and enabling parallel sampling. However, the current non-autoregressive decoder does not simultaneously fulfill two important rules of electron distribution modeling, the electron-counting rule and the symmetry rule, which violates the physical constraints of chemical reactions and thereby impairs the model performance. In this work, we propose a novel framework ReactionSink by combining two doubly stochastic self-attention mappings to obtain electron redistribution predictions that follow the above two constraints and further extend our solution to general multi-head attention mechanism with augmented constraints. To achieve this, we apply Sinkhorn’s algorithm to iteratively update self-attention mappings, which imposes doubly conservative constraint as an additional information prior on electron redistribution modeling. We theoretically show that our ReactionSink can satisfy both rules at the same time while the current decoder mechanism has to violate either of them. Empirical results demonstrate that our approach consistently improves the predictive performance of non-autoregressive models and does not bring unbearable additional computational cost.
#1099
LGPConv: Learnable Gaussian Perturbation Convolution for Lightweight Pansharpening
Pansharpening is a crucial and challenging task that aims to obtain a high spatial resolution image by merging a multispectral (MS) image and a panchromatic (PAN) image. Current methods use CNNs with standard convolution, but we’ve observed strong correlation among channel dimensions in the kernel, leading to computational burden and redundancy. To address this, we propose Learnable Gaussian Perturbation Convolution (LGPConv), surpassing standard convolution. LGPConv leverages two properties of standard convolution kernels: 1) correlations within channels, learning a premier kernel as a base to reduce parameters and training difficulties caused by redundancy; 2) introducing Gaussian noise perturbations to simulate randomness and enhance nonlinear representation within channels. We incorporate LGPConv into a well-designed pansharpening network and demonstrate its superiority through extensive experiments, achieving state-of-the-art performance with minimal parameters (27K). Code is available on the GitHub page of the authors.
Wednesday 23rd August
10:15-11:15
ML: Time Series and Data Streams
#4391
Self-Recover: Forecasting Block Maxima in Time Series from Predictors with Disparate Temporal Coverage Using Self-Supervised Learning
Forecasting the block maxima of a future time window is a challenging task due to the difficulty in inferring the tail distribution of a target variable. As the historical observations alone may not be sufficient to train robust models to predict the block maxima, domain-driven process models are often available in many scientific domains to supplement the observation data and improve the forecast accuracy. Unfortunately, coupling the historical observations with process model outputs is a challenge due to their disparate temporal coverage. This paper presents Self-Recover, a deep learning framework to predict the block maxima of a time window by employing self-supervised learning to address the varying temporal data coverage problem. Specifically Self-Recover uses a combination of contrastive and generative self-supervised learning schemes along with a denoising autoencoder to impute the missing values. The framework also combines representations of the historical observations with process model outputs via a residual learning approach and learns the generalized extreme value (GEV) distribution characterizing the block maxima values. This enables the framework to reliably estimate the block maxima of each time window along with its confidence interval. Extensive experiments on real-world datasets demonstrate the superiority of Self-Recover compared to other state-of-the-art forecasting methods.
#3434
pTSE: A Multi-model Ensemble Method for Probabilistic Time Series Forecasting
Various probabilistic time series forecasting models have sprung up and shown remarkably good performance. However, the choice of model highly relies on the characteristics of the input time series and the fixed distribution that model is based on. Due to the fact that the probability distributions cannot be averaged over different models straightforwardly, the current time series model ensemble methods cannot be directly applied to improve the robustness and accuracy of forecasting. To address this issue, we propose pTSE, a multi-model distribution ensemble method for probabilistic forecasting based on Hidden Markov Model (HMM). pTSE only takes off-the-shelf outputs from member models without requiring further information about each model. Besides, we provide a complete theoretical analysis of pTSE to prove that the empirical distribution of time series subject to an HMM will converge to the stationary distribution almost surely. Experiments on benchmarks show the superiority of pTSE over all member models and competitive ensemble methods.
#SV5579
Transformers in Time Series: A Survey
Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review Transformer schemes for time series modeling by highlighting their strengths as well as limitations. In particular, we examine the development of time series Transformers in two perspectives. From the perspective of network structure, we summarize the adaptations and modifications that have been made to Transformers in order to accommodate the challenges in time series analysis. From the perspective of applications, we categorize time series Transformers based on common tasks including forecasting, anomaly detection, and classification. Empirically, we perform robust analysis, model size analysis, and seasonal-trend decomposition analysis to study how Transformers perform in time series. Finally, we discuss and suggest future directions to provide useful research guidance.
#3162
DiffAR: Adaptive Conditional Diffusion Model for Temporal-augmented Human Activity Recognition
Human activity recognition (HAR) is a fundamental sensing and analysis technique that supports diverse applications, such as smart homes and healthcare. In device-free and non-intrusive HAR, WiFi channel state information (CSI) captures wireless signal variations caused by human interference without the need for video cameras or on-body sensors. However, current CSI-based HAR performance is hampered by incomplete CSI recordings due to fixed window sizes in CSI collection and human/machine errors that incur missing values in CSI. To address these issues, we propose DiffAR, a temporal-augmented HAR approach that improves HAR performance by augmenting CSI. DiffAR devises a novel Adaptive Conditional Diffusion Model (ACDM) to synthesize augmented CSI, which tackles the issue of fixed windows by forecasting and handles missing values with imputation. Compared to existing diffusion models, ACDM improves the synthesis quality by guiding progressive synthesis with step-specific conditions. DiffAR further exploits an ensemble classifier for activity recognition using both raw and augmented CSI. Extensive experiments on four public datasets show that DiffAR achieves the best synthesis quality of augmented CSI and outperforms state-of-the-art CSI-based HAR methods in recognition performance. The source code of DiffAR is available at https://github.com/huangshk/DiffAR.
#5155
Prompt Federated Learning for Weather Forecasting: Toward Foundation Models on Meteorological Data
To tackle the global climate challenge, it urgently needs to develop a collaborative platform for comprehensive weather forecasting on large-scale meteorological data. Despite urgency, heterogeneous meteorological sensors across countries and regions, inevitably causing multivariate heterogeneity and data exposure, become the main barrier. This paper develops a foundation model across regions capable of understanding complex meteorological data and providing weather forecasting. To relieve the data exposure concern across regions, a novel federated learning approach has been proposed to collaboratively learn a brand-new spatio-temporal Transformer-based foundation model across participants with heterogeneous meteorological data. Moreover, a novel prompt learning mechanism has been adopted to satisfy low-resourced sensors’ communication and computational constraints. The effectiveness of the proposed method has been demonstrated on classical weather forecasting tasks using three meteorological datasets with multivariate time series.
#3556
Not Only Pairwise Relationships: Fine-Grained Relational Modeling for Multivariate Time Series Forecasting
Recent graph-based methods achieve significant success in multivariate time series modeling and forecasting due to their ability to handle relationships among time series variables. However, only pairwise relationships are considered in most existing works. They ignore beyond-pairwise relationships and their potential categories in practical scenarios, which leads to incomprehensive relationship learning for multivariate time series forecasting. In this paper, we present ReMo, a Relational Modeling-based method, to promote fine-grained relational learning among multivariate time series data. Firstly, by treating time series variables and complex relationships as nodes and hyperedges, we extract multi-view hypergraphs from data to capture beyond-pairwise relationships. Secondly, a novel hypergraph message passing strategy is designed to characterize both nodes and hyperedges by inferring the potential categories of relationships and further distinguishing their impacts on time series variables. By integrating these two modules into the time series forecasting framework, ReMo effectively improves the performance of multivariate time series forecasting. The experimental results on seven commonly used datasets from different domains demonstrate the superiority of our model.
Wednesday 23rd August
10:15-11:15
ML: Neuro-symbolic Methods
#3049
Neuro-Symbolic Class Expression Learning
Models computed using deep learning have been effectively applied to tackle various problems in many disciplines. Yet, the predictions of these models are often at most post-hoc and locally explainable.
In contrast, class expressions in description logics are ante-hoc and globally explainable. Although state-of-the-art symbolic machine learning approaches are being successfully applied to learn class expressions, their application at large scale has been hindered by their impractical runtimes. Arguably, the reliance on myopic heuristic functions contributes to this limitation. We propose a novel neuro-symbolic class expression learning model, DRILL, to mitigate this limitation. By learning non-myopic heuristic functions with deep Q-learning, DRILL efficiently steers the standard search procedure in a quasi-ordered search space towards goal states. Our extensive experiments on 4 benchmark datasets and 390 learning problems suggest that DRILL converges to goal states at least 2.7 times faster than state-of-the-art models on all learning problems. The results of our statistical significance test confirms that DRILL converges to goal states significantly faster (p-value <1%) than state-of-the-art models on all benchmark datasets. We provide an open-source implementation of DRILL, including pre-trained models, training and evaluation scripts.
#5164
Learning to Binarize Continuous Features for Neuro-Rule Networks
Neuro-Rule Networks (NRNs) emerge as a promising neuro-symbolic method, enjoyed by the ability to equate fully-connected neural networks with logic rules. To support learning logic rules consisting of boolean variables, converting input features into binary representations is required. Different from discrete features that could be directly transformed by one-hot encodings, continuous features need to be binarized based on some numerical intervals. Existing studies usually select the bound values of intervals based on empirical strategies (e.g., equal-width interval). However, it is not optimal since the bounds are fixed and cannot be optimized to accommodate the ultimate training target. In this paper, we propose AutoInt, an approach that automatically binarizes continuous features and enables the intervals to be optimized with NRNs in an end-to-end fashion. Specifically, AutoInt automatically selects an interval for a given continuous feature in a soft manner to enable a differentiable learning procedure of interval-related parameters. Moreover, it introduces an additional soft K-means clustering loss to make the interval centres approach the original feature value distribution, thus reducing the risk of overfitting intervals. We conduct comprehensive experiments on public datasets and demonstrate the effectiveness of AutoInt in boosting the performance of NRNs.
#3943
Neuro-Symbolic Learning of Answer Set Programs from Raw Data
One of the ultimate goals of Artificial Intelligence is to assist humans in complex decision making. A promising direction for achieving this goal is Neuro-Symbolic AI, which aims to combine the interpretability of symbolic techniques with the ability of deep learning to learn from raw data. However, most current approaches require manually engineered symbolic knowledge, and where end-to-end training is considered, such approaches are either restricted to learning definite programs, or are restricted to training binary neural networks. In this paper, we introduce Neuro-Symbolic Inductive Learner (NSIL), an approach that trains a general neural network to extract latent concepts from raw data, whilst learning symbolic knowledge that maps latent concepts to target labels. The novelty of our approach is a method for biasing the learning of symbolic knowledge, based on the in-training performance of both neural and symbolic components. We evaluate NSIL on three problem domains of different complexity, including an NP-complete problem. Our results demonstrate that NSIL learns expressive knowledge, solves computationally complex problems, and achieves state-of-the-art performance in terms of accuracy and data efficiency. Code and technical appendix: https://github.com/DanCunnington/NSIL
#2459
Scalable Coupling of Deep Learning with Logical Reasoning
In the ongoing quest for hybridizing discrete reasoning with neural nets, there is an increasing interest in neural architectures that can learn how to solve discrete reasoning or optimization problems from natural inputs. In this paper, we introduce a scalable neural architecture and loss function dedicated to learning the constraints and criteria of NP-hard reasoning problems expressed as discrete Graphical Models. We empirically show our loss function is able to efficiently learn how to solve NP-hard reasoning problems from natural inputs as the symbolic, visual or many-solutions Sudoku problems as well as the energy optimization formulation of the protein design problem, providing data efficiency, interpretability, and a posteriori control over predictions.
#3572
DeepPSL: End-to-End Perception and Reasoning
We introduce DeepPSL a variant of probabilistic soft logic (PSL) to produce an end-to-end trainable system that integrates reasoning and perception. PSL represents first-order logic in terms of a convex graphical model – hinge-loss Markov random fields (HL-MRFs). PSL stands out among probabilistic logic frameworks due to its tractability having been applied to systems of more than 1 billion ground rules. The key to our approach is to represent predicates in first-order logic using deep neural networks and then to approximately back-propagate through the HL-MRF and thus train every aspect of the first-order system being represented. We believe that this approach represents an interesting direction for the integration of deep learning and reasoning techniques with applications to knowledge base learning, multi-task learning, and explainability. Evaluation on three different tasks demonstrates that DeepPSL significantly outperforms state-of-the-art neuro-symbolic methods on scalability while achieving comparable or better accuracy.
#2759
Deep Symbolic Learning: Discovering Symbols and Rules from Perceptions
Neuro-Symbolic (NeSy) integration combines symbolic reasoning with Neural Networks (NNs) for tasks requiring perception and reasoning. Most NeSy systems rely on continuous relaxation of logical knowledge, and no discrete decisions are made within the model pipeline. Furthermore, these methods assume that the symbolic rules are given. In this paper, we propose Deep Symboilic Learning (DSL), a NeSy system that learns \emph{NeSy-functions}, i.e., the composition of a (set of) perception functions which map continuous data to discrete symbols, and a symbolic function over the set of symbols. DSL simultaneously learns the perception and symbolic functions while being trained only on their composition (NeSy-function). The key novelty of DSL is that it can create internal (interpretable) symbolic representations and map them to perception inputs within a differentiable NN learning pipeline. The created symbols are automatically selected to generate symbolic functions that best explain the data. We provide experimental analysis to substantiate the efficacy of DSL in simultaneously learning perception and symbolic functions.
Wednesday 23rd August
10:15-11:15
Computer Vision (2/6)
#1798
Multi-Modality Deep Network for JPEG Artifacts Reduction
In recent years, many convolutional neural network-based models are designed for JPEG artifacts reduction, and have achieved notable progress. However, few methods are suitable for extreme low-bitrate image compression artifacts reduction. The main challenge is that the highly compressed image loses too much information, resulting in reconstructing high-quality image difficultly. To address this issue, we propose a multimodal fusion learning method for text-guided JPEG artifacts reduction, in which the corresponding text description not only provides the potential prior information of the highly compressed image, but also serves as supplementary information to assist in image deblocking. We fuse image features and text semantic features from the global and local perspectives respectively, and design a contrastive loss built upon contrastive learning to produce visually pleasing results. Extensive experiments, including a user study, prove that our method can obtain better deblocking results compared to the state-of-the-art methods.
#1626
PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View
Accurately perceiving instances and predicting their future motion are key tasks for autonomous vehicles, enabling them to navigate safely in complex urban traffic. While bird’s-eye view (BEV) representations are commonplace in perception for autonomous driving, their potential in a motion prediction setting is less explored. Existing approaches for BEV instance prediction from surround cameras rely on a multi-task auto-regressive setup coupled with complex post-processing to predict future instances in a spatio-temporally consistent manner. In this paper, we depart from this paradigm and propose an efficient novel end-to-end framework named PowerBEV, which differs in several design choices aimed at reducing the inherent redundancy in previous methods. First, rather than predicting the future in an auto-regressive fashion, PowerBEV uses a parallel, multi-scale module built from lightweight 2D convolutional networks. Second, we show that segmentation and centripetal backward flow are sufficient for prediction, simplifying previous multi-task objectives by eliminating redundant output modalities. Building on this output representation, we propose a simple, flow warping-based post-processing approach which produces more stable instance associations across time. Through this lightweight yet powerful design, PowerBEV outperforms state-of-the-art baselines on the NuScenes Dataset and poses an alternative paradigm for BEV instance prediction. We made our code publicly available at: https://github.com/EdwardLeeLPZ/PowerBEV.
#658
Align, Perturb and Decouple: Toward Better Leverage of Difference Information for RSI Change Detection
Change detection is a widely adopted technique in remote sense imagery (RSI) analysis in the discovery of long-term geomorphic evolution. To highlight the areas of semantic changes, previous effort mostly pays attention to learning representative feature descriptors of a single image, while the difference information is either modeled with simple difference operations or implicitly embedded via feature interactions. Nevertheless, such difference modeling can be noisy since it suffers from non-semantic changes and lacks explicit guidance from image content or context. In this paper, we revisit the importance of feature difference for change detection in RSI, and propose a series of operations to fully exploit the difference information: Alignment, Perturbation and Decoupling (APD). Firstly, alignment leverages contextual similarity to compensate for the non-semantic difference in feature space. Next, a difference module trained with semantic-wise perturbation is adopted to learn more generalized change estimators, which reversely bootstraps feature extraction and prediction. Finally, a decoupled dual-decoder structure is designed to predict semantic changes in both content-aware and content-agnostic manners. Extensive experiments are conducted on benchmarks of LEVIR-CD, WHU-CD and DSIFN-CD, demonstrating our proposed operations bring significant improvement and achieve competitive results under similar comparative conditions. Code is available at https://github.com/wangsp1999/CD-Research/tree/main/openAPD
#1152
WBFlow: Few-shot White Balance for sRGB Images via Reversible Neural Flows
The sRGB white balance methods aim to correct the nonlinear color cast of sRGB images without accessing raw values. Although existing methods have achieved increasingly better results, their generalization to sRGB images from multiple cameras is still under explored. In this paper, we propose the network named WBFlow that not only performs superior white balance for sRGB images but also generalizes well to multiple cameras. Specifically, we take advantage of neural flow to ensure the reversibility of WBFlow, which enables lossless rendering of color cast sRGB images back to pseudo raw features for linear white balancing and thus achieves superior performance. Furthermore, inspired by camera transformation approaches, we have designed a camera transformation (CT) in pseudo raw feature space to generalize WBFlow for different cameras via few shot learning. By utilizing a few sRGB images from an untrained camera, our WBFlow can perform well on this camera by learning the camera specific parameters of CT. Extensive experiments show that WBFlow achieves superior camera generalization and accuracy on three public datasets as well as our rendered multiple camera sRGB dataset. Our code is available at https://github.com/ChunxiaoLe/WBFlow.
#2048
Video Diffusion Models with Local-Global Context Guidance
Diffusion models have emerged as a powerful paradigm in video synthesis tasks including prediction, generation, and interpolation. Due to the limitation of the computational budget, existing methods usually implement conditional diffusion models with an autoregressive inference pipeline, in which the future fragment is predicted based on the distribution of adjacent past frames. However, only the conditions from a few previous frames can’t capture the global temporal coherence, leading to inconsistent or even outrageous results in long-term video prediction. In this paper, we propose a Local-Global Context guided Video Diffusion model (LGC-VD) to capture multi-perception conditions for producing high-quality videos in both conditional/unconditional settings. In LGC-VD, the UNet is implemented with stacked residual blocks with self-attention units, avoiding the undesirable computational cost in 3D Conv. We construct a local-global context guidance strategy to capture the multi-perceptual embedding of the past fragment to boost the consistency of future prediction. Furthermore, we propose a two-stage training strategy to alleviate the effect of noisy frames for more stable predictions. Our experiments demonstrate that the proposed method achieves favorable performance on video prediction, interpolation, and unconditional video generation. We release code at https://github.com/exisas/LGC-VD.
Wednesday 23rd August
10:15-11:15
MAS: Multi-agent Learning (1/2)
#3724
Learning in Multi-Memory Games Triggers Complex Dynamics Diverging from Nash Equilibrium
Repeated games consider a situation where multiple agents are motivated by their independent rewards throughout learning. In general, the dynamics of their learning become complex. Especially when their rewards compete with each other like zero-sum games, the dynamics often do not converge to their optimum, i.e., the Nash equilibrium. To tackle such complexity, many studies have understood various learning algorithms as dynamical systems and discovered qualitative insights among the algorithms. However, such studies have yet to handle multi-memory games (where agents can memorize actions they played in the past and choose their actions based on their memories), even though memorization plays a pivotal role in artificial intelligence and interpersonal relationship. This study extends two major learning algorithms in games, i.e., replicator dynamics and gradient ascent, into multi-memory games. Then, we prove their dynamics are identical. Furthermore, theoretically and experimentally, we clarify that the learning dynamics diverge from the Nash equilibrium in multi-memory zero-sum games and reach heteroclinic cycles (sojourn longer around the boundary of the strategy space), providing a fundamental advance in learning in games.
#2653
Decentralized Anomaly Detection in Cooperative Multi-Agent Reinforcement Learning
We consider the problem of detecting adversarial attacks against cooperative multi-agent reinforcement learning. We propose a decentralized scheme that allows agents to detect the abnormal behavior of one compromised agent. Our approach is based on a recurrent neural network (RNN) trained during cooperative learning to predict the action distribution of other agents based on local observations. The predicted distribution is used for computing a normality score for the agents, which allows the detection of the misbehavior of other agents. To explore the robustness of the proposed detection scheme, we formulate the worst-case attack against our scheme as a constrained reinforcement learning problem. We propose to compute an attack policy by optimizing the corresponding dual function using reinforcement learning. Extensive simulations on various multi-agent benchmarks show the effectiveness of the proposed detection scheme in detecting state-of-the-art attacks and in limiting the impact of undetectable attacks.
#4663
Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning
Practical uses of Artificial Intelligence (AI) in the real world have demonstrated the importance of embedding moral choices into intelligent agents. They have also highlighted that defining top-down ethical constraints on AI according to any one type of morality is extremely challenging and can pose risks. A bottom-up learning approach may be more appropriate for studying and developing ethical behavior in AI agents. In particular, we believe that an interesting and insightful starting point is the analysis of emergent behavior of Reinforcement Learning (RL) agents that act according to a predefined set of moral rewards in social dilemmas.
In this work, we present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories. We aim to design reward structures that are simplified yet representative of a set of key ethical systems. Therefore, we first define moral reward functions that distinguish between consequence- and norm-based agents, between morality based on societal norms or internal virtues, and between single- and mixed-virtue (e.g., multi-objective) methodologies. Then, we evaluate our approach by modeling repeated dyadic interactions between learning moral agents in three iterated social dilemma games (Prisoner’s Dilemma, Volunteer’s Dilemma and Stag Hunt). We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation, and the corresponding social outcomes. Finally, we discuss the implications of these findings for the development of moral agents in artificial and mixed human-AI societies.
#1868
Anticipatory Fictitious Play
Fictitious play is an algorithm for computing Nash equilibria of matrix games. Recently, machine learning variants of fictitious play have been successfully applied to complicated real-world games. This paper presents a simple modification of fictitious play which is a strict improvement over the original: it has the same theoretical worst-case convergence rate, is equally applicable in a machine learning context, and enjoys superior empirical performance. We conduct an extensive comparison of our algorithm with fictitious play, proving an optimal O(1/t) convergence rate for certain classes of games, demonstrating superior performance numerically across a variety of games, and concluding with experiments that extend these algorithms to the setting of deep multiagent reinforcement learning.
#4671
Beyond Strict Competition: Approximate Convergence of Multi-agent Q-Learning Dynamics
The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum assumption.
Motivated by this, we study a smooth variant of Q-Learning, a popular reinforcement learning dynamics which balances the agents’ tendency to maximise their payoffs with their propensity to explore the state space. We examine this dynamic in games which are `close’ to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by the `distance’ to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest’ network zero-sum game can be found efficiently. Importantly, our theoretical guarantees are widely applicable in different game settings, regardless of whether the dynamics ultimately reach an equilibrium, or remain non convergent.
#J5758
Multi-Agent Advisor Q-Learning (Extended Abstract)
In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. We provide a principled framework for incorporating action recommendations from online sub-optimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning-based algorithms: ADMIRAL – Decision Making (ADMIRAL-DM) and ADMIRAL – Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors.
Wednesday 23rd August
10:15-11:15
CV: Recognition (Object Detection, Categorization) (1/3)
#86
TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition
Text irregularities pose significant challenges to scene text recognizers. Thin-Plate Spline (TPS)-based rectification is widely regarded as an effective means to deal with them. Currently, the calculation of TPS transformation parameters purely depends on the quality of regressed text borders. It ignores the text content and often leads to unsatisfactory rectified results for severely distorted text. In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time. TPS++ formulates the parameter calculation as a joint process of foreground control point regression and content-based attention score estimation, which is computed by a dedicated designed gated-attention block. TPS++ builds a more flexible content-aware rectifier, generating a natural text correction that is easier to read by the subsequent recognizer. Moreover, TPS++ shares the feature backbone with the recognizer in part and implements the rectification at feature-level rather than image-level, incurring only a small overhead in terms of parameters and inference time. Experiments on public benchmarks show that TPS++ consistently improves the recognition and achieves state-of-the-art accuracy. Meanwhile, it generalizes well on different backbones and recognizers. Code is at https://github.com/simplify23/TPS_PP.
#5090
Orientation-Independent Chinese Text Recognition in Scene Images
Scene text recognition (STR) has attracted much attention due to its broad applications. The previous works pay more attention to dealing with the recognition of Latin text images with complex backgrounds by introducing language models or other auxiliary networks. Different from Latin texts, many vertical Chinese texts exist in natural scenes, which brings difficulties to current state-of-the-art STR methods. In this paper, we take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images, thus recognizing both horizontal and vertical texts robustly in natural scenes. Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information. We conduct experiments on a scene dataset for benchmarking Chinese text recognition, and the results demonstrate that the proposed method can indeed improve performance through disentangling content and orientation information. To further validate the effectiveness of our method, we additionally collect a Vertical Chinese Text Recognition (VCTR) dataset. The experimental results show that the proposed method achieves 45.63\% improvement on VCTR when introducing CIRN to the baseline model.
#1593
Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement
Scene text image super-resolution (STISR), aiming to improve image quality while boosting downstream scene text recognition accuracy, has recently achieved great success. However, most existing methods treat the foreground (character regions) and background (non-character regions) equally in the forward process, and neglect the disturbance from the complex background, thus limiting the performance. To address these issues, in this paper, we propose a novel method LEMMA that explicitly models character regions to produce high-level text-specific guidance for super-resolution. To model the location of characters effectively, we propose the location enhancement module to extract character region features based on the attention map sequence. Besides, we propose the multi-modal alignment module to perform bidirectional visual-semantic alignment to generate high-quality prior guidance, which is then incorporated into the super-resolution branch in an adaptive manner using the proposed adaptive fusion module. Experiments on TextZoom and four scene text recognition benchmarks demonstrate the superiority of our method over other state-of-the-art methods. Code is available at https://github.com/csguoh/LEMMA.
#1067
Few-shot Classification via Ensemble Learning with Multi-Order Statistics
Transfer learning has been widely adopted for few-shot classification. Recent studies reveal that obtaining good generalization representation of images on novel classes is the key to improving the few-shot classification accuracy. To address this need, we prove theoretically that leveraging ensemble learning on the base classes can correspondingly reduce the true error in the novel classes. Following this principle, a novel method named Ensemble Learning with Multi-Order Statistics (ELMOS) is proposed in this paper. In this method, after the backbone network, we use multiple branches to create the individual learners in the ensemble learning, with the goal to reduce the storage cost. We then introduce different order statistics pooling in each branch to increase the diversity of the individual learners. The learners are optimized with supervised losses during the pre-training phase. After pre-training, features from different branches are concatenated for classifier evaluation. Extensive experiments demonstrate that each branch can complement the others and our method can produce a state-of-the-art performance on multiple few-shot classification benchmark datasets.
#1235
Independent Feature Decomposition and Instance Alignment for Unsupervised Domain Adaptation
Existing Unsupervised Domain Adaptation (UDA) methods typically attempt to perform knowledge transfer in a domain-invariant space explicitly or implicitly. In practice, however, the obtained features is often mixed with domain-specific information which causes performance degradation. To overcome this fundamental limitation, this article presents a novel independent feature decomposition and instance alignment method (IndUDA in short). Specifically, based on an invertible flow, we project the base features into a decomposed latent space with domain-invariant and domain-specific dimensions. To drive semantic decomposition independently, we then swap the domain-invariant part across source and target domain samples with the same category and require their inverted features are consistent in class-level with the original features. By treating domain-specific information as noise, we replace it by Gaussian noise and further regularize source model training by instance alignment, i.e., requiring the base features close to the corresponding reconstructed features, respectively. Extensive experiment results demonstrate that our method achieves state-of-the-art performance on popular UDA benchmarks. The appendix and code are available at https://github.com/ayombeach/IndUDA.
#668
Universal Adaptive Data Augmentation
Existing automatic data augmentation (DA) methods either ignore updating DA’s parameters according to the target model’s state during training or adopt update strategies that are not effective enough. In this work, we design a novel data augmentation strategy called “Universal Adaptive Data Augmentation” (UADA). Different from existing methods, UADA would adaptively update DA’s parameters according to the target model’s gradient information during training: given a pre-defined set of DA operations, we randomly decide types and magnitudes of DA operations for every data batch during training, and adaptively update DA’s parameters along the gradient direction of the loss concerning DA’s parameters. In this way, UADA can increase the training loss of the target networks, and the target networks would learn features from harder samples to improve the generalization. Moreover, UADA is very general and can be utilized in numerous tasks, e.g., image classification, semantic segmentation and object detection. Extensive experiments with various models are conducted on CIFAR-10, CIFAR-100, ImageNet, tiny-ImageNet, Cityscapes, and VOC07+12 to prove the significant performance improvements brought by UADA.
Wednesday 23rd August
10:15-11:15
DM: Mining Graphs (1/2)
#3814
Multi-Scale Subgraph Contrastive Learning
Graph-level contrastive learning, aiming to learn the representations for each graph by contrasting two augmented graphs, has attracted considerable attention. Previous studies usually simply assume that a graph and its augmented graph as a positive pair, otherwise as a negative pair. However, it is well known that graph structure is always complex and multi-scale, which gives rise to a fundamental question: after graph augmentation, will the previous assumption still hold in reality? By an experimental analysis, we discover the semantic information of an augmented graph structure may be not consistent as original graph structure, and whether two augmented graphs are positive or negative pairs is highly related with the multi-scale structures. Based on this finding, we propose a multi-scale subgraph contrastive learning architecture which is able to characterize the fine-grained semantic information. Specifically, we generate global and local views at different scales based on subgraph sampling, and construct multiple contrastive relationships according to their semantic associations to provide richer self-supervised signals. Extensive experiments and parametric analyzes on eight graph classification real-world datasets well demonstrate the effectiveness of the proposed method.
#2338
CSGCL: Community-Strength-Enhanced Graph Contrastive Learning
Graph Contrastive Learning (GCL) is an effective way to learn generalized graph representations in a self-supervised manner, and has grown rapidly in recent years. However, the underlying community semantics has not been well explored by most previous GCL methods. Research that attempts to leverage communities in GCL regards them as having the same influence on the graph, leading to extra representation errors. To tackle this issue, we define ”community strength” to measure the difference of influence among communities. Under this premise, we propose a Community-Strength-enhanced Graph Contrastive Learning (CSGCL) framework to preserve community strength throughout the learning process. Firstly, we present two novel graph augmentation methods, Communal Attribute Voting (CAV) and Communal Edge Dropping (CED), where the perturbations of node attributes and edges are guided by community strength. Secondly, we propose a dynamic ”Team-up” contrastive learning scheme, where community strength is used to progressively fine-tune the contrastive objective. We report extensive experiment results on three downstream tasks: node classification, node clustering, and link prediction. CSGCL achieves state-of-the-art performance compared with other GCL methods, validating that community strength brings effectiveness and generality to graph representations. Our code is available at https://github.com/HanChen-HUST/CSGCL.
#1805
Totally Dynamic Hypergraph Neural Networks
Recent dynamic hypergraph neural networks (DHGNNs) are designed to adaptively optimize the hypergraph structure to avoid the dependence on the initial hypergraph structure, thus capturing more hidden information for representation learning. However, most existing DHGNNs cannot adjust the hyperedge number and thus fail to fully explore the underlying hypergraph structure. This paper proposes a new method, namely, totally hypergraph neural network (TDHNN), to adjust the hyperedge number for optimizing the hypergraph structure. Specifically, the proposed method first captures hyperedge feature distribution to obtain dynamical hyperedge features rather than fixed ones, by conducting the sampling from the learned distribution.
The hypergraph is then constructed based on the attention coefficients of both sampled hyperedges and nodes. The node features are dynamically updated by designing a simple hypergraph convolution algorithm. Experimental results on real datasets demonstrate the effectiveness of the proposed method, compared to SOTA methods. The source code can be accessed via https://github.com/HHW-zhou/TDHNN.
#3955
Enhancing Network by Reinforcement Learning and Neural Confined Local Search
It has been found that many real networks, such as power grids and the Internet, are non-robust, i.e., attacking a small set of nodes would cause the paralysis of the entire network. Thus, the Network Enhancement Problem~(NEP), i.e., improving the robustness of a given network by modifying its structure, has attracted increasing attention. Heuristics have been proposed to address NEP. However, a hand-engineered heuristic often has significant performance limitations. A recently proposed model solving NEP by reinforcement learning has shown superior performance than heuristics on in-distribution datasets. However, their model shows considerably inferior out-of-distribution generalization ability when enhancing networks against the degree-based targeted attack. In this paper, we propose a more effective model with stronger generalization ability by incorporating domain knowledge including measurements of local network structures and decision criteria of heuristics. We further design a hierarchical attention model to utilize the network structure directly, where the query range changes from local to global. Finally, we propose neural confined local search~(NCLS) to realize the effective search of a large neighborhood, which exploits a learned model to confine the neighborhood to avoid exhaustive enumeration. We conduct extensive experiments on synthetic and real networks to verify the ability of our models.
#3636
Semi-supervised Domain Adaptation in Graph Transfer Learning
As a specific case of graph transfer learning, unsupervised domain adaptation on graphs aims for knowledge transfer from label-rich source graphs to unlabeled target graphs. However, graphs with topology and attributes usually have considerable cross-domain disparity and there are numerous real-world scenarios where merely a subset of nodes are labeled in the source graph. This imposes critical challenges on graph transfer learning due to serious domain shifts and label scarcity. To address these challenges, we propose a method named Semi-supervised Graph Domain Adaptation (SGDA). To deal with the domain shift, we add adaptive shift parameters to each of the source nodes, which are trained in an adversarial manner to align the cross-domain distributions of node embedding. Thus, the node classifier trained on labeled source nodes can be transferred to the target nodes. Moreover, to address the label scarcity, we propose pseudo-labeling on unlabeled nodes, which improves classification on the target graph via measuring the posterior influence of nodes based on their relative position to the class centroids. Finally, extensive experiments on a range of publicly accessible datasets validate the effectiveness of our proposed SGDA in different experimental settings.
#2276
Commonsense Knowledge Enhanced Sentiment Dependency Graph for Sarcasm Detection
Sarcasm is widely utilized on social media platforms such as Twitter and Reddit. Sarcasm detection is required for analyzing people’s true feelings since sarcasm is commonly used to portray a reversed emotion opposing the literal meaning. The syntactic structure is the key to make better use of commonsense when detecting sarcasm. However, it is extremely challenging to effectively and explicitly explore the information implied in syntactic structure and commonsense simultaneously. In this paper, we apply the pre-trained COMET model to generate relevant commonsense knowledge, and explore a novel scenario of constructing a commonsense-augmented sentiment graph and a commonsense-replaced dependency graph for each text. Based on this, a Commonsense Sentiment Dependency Graph Convolutional Network (CSDGCN) framework is proposed to explicitly depict the role of external commonsense and inconsistent expressions over the context for sarcasm detection by interactively modeling the sentiment and dependency information. Experimental results on several benchmark datasets reveal that our proposed method beats the state-of-the-art methods in sarcasm detection, and has a stronger interpretability.
Wednesday 23rd August
10:15-11:15
Game Theory and Economic Paradigms (1/2)
#4476
Discrete Two Player All-Pay Auction with Complete Information
We study discrete two player all-pay auction with complete information. We provide full characterization of mixed strategy Nash equilibria and show that they constitute a subset of Nash equilibria of discrete General Lotto game. We show that equilibria are not unique in general but they are interchangeable and sets of equilibrium strategies are convex. We also show that equilibrium payoffs are unique, unless valuation of at least one of the players is an even integer number. If equilibrium payoffs are not unique, continuum of equilibrium payoffs are possible.
#1966
Inferring Private Valuations from Behavioral Data in Bilateral Sequential Bargaining
Inferring bargainers’ private valuations on items from their decisions is crucial for analyzing their strategic behaviors in bilateral sequential bargaining. Most existing approaches that infer agents’ private information from observable data either rely on strong equilibrium assumptions or require a careful design of agents’ behavior models. To overcome these weaknesses, we propose a Bayesian Learning-based Valuation Inference (BLUE) framework. Our key idea is to derive feasible intervals of bargainers’ private valuations from their behavior data, using the fact that most bargainers do not choose strictly dominated strategies. We leverage these feasible intervals to guide our inference. Specifically, we first model each bargainer’s behavior function (which maps his valuation and bargaining history to decisions) via a recurrent neural network. Second, we learn these behavior functions by utilizing a novel loss function defined based on feasible intervals. Third, we derive the posterior distributions of bargainers’ valuations according to their behavior data and learned behavior functions. Moreover, we account for the heterogeneity of bargainer behaviors, and propose a clustering algorithm (K-Loss) to improve the efficiency of learning these behaviors. Experiments on both synthetic and real bargaining data show that our inference approach outperforms baselines.
#1834
A Unifying Formal Approach to Importance Values in Boolean Functions
Boolean functions and their representation through logics, circuits, machine learning classifiers, or binary decision diagrams (BDDs) play a central role in the design and analysis of computing systems. Quantifying the relative impact of variables on the truth value by means of importance values can provide useful insights to steer system design and debugging. In this paper, we introduce a uniform framework for reasoning about such values, relying on a generic notion of importance value functions (IVFs). The class of IVFs is defined by axioms motivated from several notions of importance values introduced in the literature, including Ben-Or and Linial’s influence and Chockler, Halpern, and Kupferman’s notion of responsibility and blame. We establish a connection between IVFs and game-theoretic concepts such as Shapley and Banzhaf values, both of which measure the impact of players on outcomes in cooperative games. Exploiting BDD-based symbolic methods and projected model counting, we devise and evaluate practical computation schemes for IVFs.
#J5688
Rethinking Formal Models of Partially Observable Multiagent Decision Making (Extended Abstract)
Multiagent decision-making in partially observable environments is usually modelled as either an extensive-form game (EFG) in game theory or a partially observable stochastic game (POSG) in multiagent reinforcement learning (MARL). One issue with the current situation is that while most practical problems can be modelled in both formalisms, the relationship of the two models is unclear, which hinders the transfer of ideas between the two communities. A second issue is that while EFGs have recently seen significant algorithmic progress, their classical formalization is unsuitable for efficient presentation of the underlying ideas, such as those around decomposition.
To solve the first issue, we introduce factored-observation stochastic games (FOSGs), a minor modification of the POSG formalism which distinguishes between private and public observation and thereby greatly simplifies decomposition. To remedy the second issue, we show that FOSGs and POSGs are naturally connected to EFGs: by “unrolling” a FOSG into its tree form, we obtain an EFG. Conversely, any perfect-recall timeable EFG corresponds to some underlying FOSG in this manner. Moreover, this relationship justifies several minor modifications to the classical EFG formalization that recently appeared as an implicit response to the model’s issues with decomposition. Finally, we illustrate the transfer of ideas between EFGs and MARL by presenting three key EFG techniques — counterfactual regret minimization, sequence form, and decomposition — in the FOSG framework.
#792
Optimal Seat Arrangement: What Are the Hard and Easy Cases?
We study four NP-hard optimal seat arrangement problems [Bodlaender et al., 2020a] which each have as input a set of n agents, where each agent has cardinal preferences over other agents, and an n-vertex undirected graph (called the seat graph). The task is to assign each agent to a distinct vertex in the seat graph such that either the sum of utilities or the minimum utility is maximized, or it is envy-free or exchange-stable. Aiming at identifying hard and easy cases, we extensively study the algorithmic complexity of the four problems by looking into natural graph classes for the seat graph (e.g., paths, cycles, stars, or matchings), problem-specific parameters (e.g., the number of non-isolated vertices in the seat graph or the maximum number of agents towards whom an agent has non-zero preferences), and preference structures (e.g., non-negative or symmetric preferences). For strict preferences and seat graphs with disjoint edges and isolated vertices, we correct an error by Bodlaender et al. [2020b] and show that finding an envy-free arrangement remains NP-hard in this case.
#4228
Auto-bidding with Budget and ROI Constrained Buyers
In online advertising markets, an increasing number of advertisers are adopting auto-bidders to buy advertising slots. This tool simplifies the process of optimizing bids based on various financial constraints.
In our study, we focus on second-price auctions where bidders have both private budget and private ROI (return on investment) constraints. We formulate the auto-bidding system design problem as a mathematical program and analyze the auto-bidders’ bidding strategy under such constraints. We demonstrate that our design ensures truthfulness, i.e., among all pure and mixed strategies, always reporting the truthful budget and ROI is an optimal strategy for the bidders. Although the program is non-convex, we provide a fast algorithm to compute the optimal bidding strategy for the bidders based on our analysis. We also study the welfare and provide a lower bound for the PoA (price of anarchy). Moreover, we prove that if all bidders utilize our auto-bidding system, a Bayesian Nash equilibrium exists. We provide a sufficient condition under which the iterated best response process converges to such an equilibrium. Finally, we conduct extensive experiments to empirically evaluate the effectiveness of our design.
Wednesday 23rd August
10:15-11:15
HAI: Cognitive Modeling
#4170
Enhancing Efficient Continual Learning with Dynamic Structure Development of Spiking Neural Networks
Children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep Neural Networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms during child growth and development, we propose Dynamic Structure Development of Spiking Neural Networks (DSD-SNN) for efficient and adaptive continual learning. When learning a sequence of tasks, the DSD-SNN dynamically assigns and grows new neurons to new tasks and prunes redundant neurons, thereby increasing memory capacity and reducing computational overhead. In addition, the overlapping shared structure helps to quickly leverage all acquired knowledge to new tasks, empowering a single network capable of supporting multiple incremental tasks (without the separate sub-network mask for each task). We validate the effectiveness of the proposed model on multiple class incremental learning and task incremental learning benchmarks. Extensive experiments demonstrated that our model could significantly improve performance, learning speed and memory capacity, and reduce computational overhead. Besides, our DSD-SNN model achieves comparable performance with the DNNs-based methods, and significantly outperforms the state-of-the-art (SOTA) performance for existing SNNs-based continual learning methods.
#2849
Spatial-Temporal Self-Attention for Asynchronous Spiking Neural Networks
The brain-inspired spiking neural networks (SNNs) are receiving increasing attention due to their asynchronous event-driven characteristics and low power consumption. As attention mechanisms recently become an indispensable part of sequence dependence modeling, the combination of SNNs and attention mechanisms holds great potential for energy-efficient and high-performance computing paradigms. However, the existing works cannot benefit from both temporal-wise attention and the asynchronous characteristic of SNNs. To fully leverage the advantages of both SNNs and attention mechanisms, we propose an SNNs-based spatial-temporal self-attention (STSA) mechanism, which calculates the feature dependence across the time and space domains without destroying the asynchronous transmission properties of SNNs. To further improve the performance, we also propose a spatial-temporal relative position bias (STRPB) for STSA to consider the spatiotemporal position of spikes. Based on the STSA and STRPB, we construct a spatial-temporal spiking Transformer framework, named STS-Transformer, which is powerful and enables SNNs to work in an asynchronous event-driven manner. Extensive experiments are conducted on popular neuromorphic datasets and speech datasets, including DVS128 Gesture, CIFAR10-DVS, and Google Speech Commands, and our experimental results can outperform other state-of-the-art models.
#930
A Low Latency Adaptive Coding Spike Framework for Deep Reinforcement Learning
In recent years, spiking neural networks (SNNs) have been used in reinforcement learning (RL) due to their low power consumption and event-driven features. However, spiking reinforcement learning (SRL), which suffers from fixed coding methods, still faces the problems of high latency and poor versatility. In this paper, we use learnable matrix multiplication to encode and decode spikes, improving the flexibility of the coders and thus reducing latency. Meanwhile, we train the SNNs using the direct training method and use two different structures for online and offline RL algorithms, which gives our model a wider range of applications. Extensive experiments have revealed that our method achieves optimal performance with ultra-low latency (as low as 0.8% of other SRL methods) and excellent energy efficiency (up to 5X the DNNs) in different algorithms and different environments.
#3138
Sketch Recognition via Part-based Hierarchical Analogical Learning
Sketch recognition has been studied for decades, but it is far from solved. Drawing styles are highly variable across people and adapting to idiosyncratic visual expressions requires data-efficient learning. Explainability also matters, so that users can see why a system got confused about something. This paper introduces a novel part-based approach for sketch recognition, based on hierarchical analogical learning, a new method to apply analogical learning to qualitative representations. Given a sketched object, our system automatically segments it into parts and constructs multi-level qualitative representations of them. Our approach performs analogical generalization at multiple levels of part descriptions and uses coarse-grained results to guide interpretation at finer levels. Experiments on the Berlin TU dataset and the Coloring Book Objects dataset show that the system can learn explainable models in a data-efficient manner.
#2072
Learnable Surrogate Gradient for Direct Training Spiking Neural Networks
Spiking neural networks (SNNs) have increasingly drawn massive research attention due to biological interpretability and efficient computation. Recent achievements are devoted to utilizing the surrogate gradient (SG) method to avoid the dilemma of non-differentiability of spiking activity to directly train SNNs by backpropagation. However, the fixed width of the SG leads to gradient vanishing and mismatch problems, thus limiting the performance of directly trained SNNs. In this work, we propose a novel perspective to unlock the width limitation of SG, called the learnable surrogate gradient (LSG) method. The LSG method modulates the width of SG according to the change of the distribution of the membrane potentials, which is identified to be related to the decay factors based on our theoretical analysis. Then we introduce the trainable decay factors to implement the LSG method, which can optimize the width of SG automatically during training to avoid the gradient vanishing and mismatch problems caused by the limited width of SG. We evaluate the proposed LSG method on both image and neuromorphic datasets. Experimental results show that the LSG method can effectively alleviate the blocking of gradient propagation caused by the limited width of SG when training deep SNNs directly. Meanwhile, the LSG method can help SNNs achieve competitive performance on both latency and accuracy.
#4842
A New ANN-SNN Conversion Method with High Accuracy, Low Latency and Good Robustness
Due to the advantages of low energy consumption, high robustness and fast inference speed, Spiking Neural Networks (SNNs), with good biological interpretability and the potential to be applied on neuromorphic hardware, are regarded as the third generation of Artificial Neural Networks (ANNs). Despite having so many advantages, the biggest challenge encountered by spiking neural networks is training difficulty caused by the non-differentiability of spike signals. ANN-SNN conversion is an effective method that solves the training difficulty by converting parameters in ANNs to those in SNNs through a specific algorithm. However, the ANN-SNN conversion method also suffers from accuracy degradation and long inference time. In this paper, we reanalyzed the relationship between Integrate-and-Fire (IF) neuron model and ReLU activation function, proposed a StepReLU activation function more suitable for SNNs under membrane potential encoding, and used it to train ANNs. Then we converted the ANNs to SNNs with extremely small conversion error and introduced leakage mechanism to the SNNs and get the final models, which have high accuracy, low latency and good robustness, and have achieved the state-of-the-art performance on various datasets such as CIFAR and ImageNet.
Wednesday 23rd August
10:15-11:15
KRR: Reasoning about Actions
#J5586
Data-Informed Knowledge and Strategies (Extended Abstract)
The article proposes a new approach to reasoning about knowledge and strategies in multiagent systems. It emphasizes data, not agents, as the source of strategic knowledge. The approach brings together Armstrong’s functional dependency expression from database theory, a data-informed knowledge modality based on a recent work by Baltag and van Benthem, and a newly proposed data-informed strategy modality. The main technical result is a sound and complete logical system that describes the interplay between these three logical operators.
#4607
Probabilistic Temporal Logic for Reasoning about Bounded Policies
To build a theory of intention revision for agents operating in stochastic environments, we need a logic in which we can explicitly reason about their decision-making policies and those policies’ uncertain outcomes. Towards this end, we propose PLBP, a novel probabilistic temporal logic for Markov Decision Processes that allows us to reason about policies of bounded size. The logic is designed so that its expressive power is sufficient for the intended applications, whilst at the same time possessing strong computational properties. We prove that the satisfiability problem for our logic is decidable, and that its model checking problem is PSPACE-complete. This allows us to e.g. algorithmically verify whether an agent’s intentions are coherent, or whether a specific policy satisfies safety and/or liveness properties.
#3948
Automatic Verification for Soundness of Bounded QNP Abstractions for Generalized Planning
Generalized planning (GP) studies the computation of general solutions for a set of planning problems. Computing general solutions with correctness guarantee has long been a key issue in GP. Abstractions are widely used to solve GP problems. For example, a popular abstraction model for GP is qualitative numeric planning (QNP), which extends classical planning with non-negative real variables that can be increased or decreased by some arbitrary amount. The refinement of correct solutions of sound abstractions are solutions with correctness guarantees for GP problems. Recently, Cui et al. proposed a uniform abstraction framework for GP. They gave model-theoretic definitions of sound and complete abstractions for GP problems. In this paper, based on Cui et al.’s work, we explore automatic verification of sound abstractions for GP. Firstly, we present a proof-theoretic characterization for sound abstractions. Secondly, based on the characterization, we give a first-order verifiable sufficient condition for sound abstractions with deterministic actions. Then we study how to verify the sufficient condition when the abstraction models are bounded QNPs where integer variables can be incremented or decremented by one. To this end, we develop methods to handle counting and transitive closure, which are often used to define numerical variables. Finally, we implement a sound bounded QNP abstraction verification system and report experimental results on several domains.
#4799
Safety Verification and Universal Invariants for Relational Action Bases
Modeling and verification of dynamic systems operating over a relational representation of states are increasingly investigated problems in AI, Business Process Management and Database Theory. To make these systems amenable to verification, the amount of information stored in each state needs to be bounded, or restrictions are imposed on the preconditions and effects of actions. We lift these restrictions by introducing the framework of Relational Action Bases (RABs), which generalizes existing frameworks and in which unbounded relational states are evolved through actions that can (1) quantify both existentially and universally over the data, and (2) use arithmetic constraints. We then study parameterized safety of RABs via (approximated) SMT-based backward search, singling out essential meta-properties of the resulting procedure, and showing how it can be realized by an off-the-shelf combination of existing verification modules of the state-of-the-art MCMT model checker. We demonstrate the effectiveness of this approach on a benchmark of data-aware business processes. Finally, we show how universal invariants can be exploited to make this procedure fully correct.
#1560
Abstraction of Nondeterministic Situation Calculus Action Theories
We develop a general framework for abstracting the behavior of an agent that operates in a nondeterministic domain, i.e., where the agent does not control
the outcome of the nondeterministic actions, based on the nondeterministic situation calculus and the ConGolog programming language. We assume that
we have both an abstract and a concrete nondeterministic basic action theory, and a refinement mapping which specifies how abstract actions, decomposed into agent actions and environment reactions, are implemented by concrete ConGolog programs. This new setting supports strategic reasoning and strategy synthesis, by allowing us to quantify separately on agent actions and environment reactions. We show that if the agent has a (strong FOND) plan/strategy to achieve a goal/complete a task at the abstract level, and it can always execute the nondeterministic abstract actions to completion at the concrete level, then there exist a refinement of it that is a (strong FOND) plan/strategy to achieve the refinement of the goal/task at the concrete level.
Wednesday 23rd August
10:15-11:15
S: Heuristic Search
#5168
A Fast Maximum k-Plex Algorithm Parameterized by the Degeneracy Gap
Given a graph, the k-plex is a vertex set in which each vertex is not adjacent to at most k-1 other vertices in the set. The maximum k-plex problem, which asks for the largest k-plex from a given graph, is an important but computationally challenging problem in applications like graph search and community detection. So far, there is a number of empirical algorithms without sufficient theoretical explanations on the efficiency. We try to bridge this gap by defining a novel parameter of the input instance, g_k(G), the gap between the degeneracy bound and the size of maximum k-plex in the given graph, and presenting an exact algorithm parameterized by g_k(G). In other words, we design an algorithm with running time polynomial in the size of input graph and exponential in g_k(G) where k is a constant. Usually, g_k(G) is small and bounded by O(log(|V|)) in real-world graphs, indicating that the algorithm runs in polynomial time. We also carry out massive experiments and show that the algorithm is competitive with the state-of-the-art solvers. Additionally, for large k values such as 15 and 20, our algorithm has superior performance over existing algorithms.
#4356
Front-to-End Bidirectional Heuristic Search with Consistent Heuristics: Enumerating and Evaluating Algorithms and Bounds
Recent research on bidirectional heuristic search (BiHS) is based on the must-expand pairs theory (MEP theory), which describes which pairs of nodes must be expanded during the search to guarantee the optimality of solutions. A separate line of research in BiHS has proposed algorithms that use lower bounds that are derived from consistent heuristics during search. This paper links these two directions, providing a comprehensive unifying view and showing that both existing and novel algorithms can be derived from the MEP theory. An extended set of bounds is formulated, encompassing both previously discovered bounds and new ones. Finally, the bounds are empirically evaluated by their contribution to the efficiency of the search
#4580
Multi-objective Search via Lazy and Efficient Dominance Checks
Multi-objective search can be used to model many real-world problems that require finding Pareto optimal paths from a specified start state to a specified goal state, while considering different costmetrics such as distance, time, and fuel. The performance of multi-objective search can be improved by making dominance checking—an operation necessary to determine whether or not a path dominates another—more efficient. This was shown in practice by BOA*, a state-of-the-art bi-objective search algorithm, which outperforms previously existing bi-objective search algorithms in part because it adopts a lazy approach towards dominance checking. EMOA*, a recent multi-objective search algorithm, generalizes BOA* to more-than-two objectives using AVL trees for dominance checking.
In this paper, we first propose Linear-Time Multi-Objective A* (LTMOA*), an multi-objective search algorithm that implements a more efficient dominance checking than EMOA* using simple data structures like arrays. We then propose an even lazier approach towards dominance checking, and the resulting algorithm, LazyLTMOA*, distinguishes from EMOA* and LTMOA* by removing the dominance checking during node generation. Our experimental results show that LazyLTMOA* outperforms EMOA* by up to an order of magnitude in terms of runtime.
#3964
Efficient Object Search in Game Maps
Video games feature a dynamic environment where locations of objects (e.g., characters, equipment, weapons, vehicles etc.) frequently change within the game world. Although searching for relevant nearby objects in such a dynamic setting is a fundamental operation, this problem has received little research attention. In this paper, we propose a simple lightweight index, called Grid Tree, to store objects and their associated textual data. Our index can be efficiently updated with the underlying updates such as object movements, and supports a variety of object search queries, including k nearest neighbors (returning the k closest objects), keyword k nearest neighbors (returning the k closest objects that satisfy query keywords), and several other variants. Our extensive experimental study, conducted on standard game maps benchmarks and real-world keywords, demonstrates that our approach has up to 2 orders of magnitude faster update times for moving objects compared to state-of-the-art approaches such as navigation mesh and IR-tree. At the same time, query performance of our approach is similar to or better than that of IR-tree and up to two orders of magnitude faster than the other competitor.
#5253
Runtime Analyses of Multi-Objective Evolutionary Algorithms in the Presence of Noise
In single-objective optimization, it is well known that evolutionary algorithms also without further adjustments can stand a certain amount of noise in the evaluation of the objective function. In contrast, this question is not at all understood for multi-objective optimization.
In this work, we conduct the first mathematical runtime analysis of a simple multi-objective evolutionary algorithm (MOEA) on a classic benchmark in the presence of noise in the objective function.
We prove that when bit-wise prior noise with rate p <= alpha/n, alpha a suitable constant, is present, the simple evolutionary multi-objective optimizer (SEMO) without any adjustments to cope with noise finds the Pareto front of the OneMinMax benchmark in time O(n^2 log n), just as in the case without noise. Given that the problem here is to arrive at a population consisting of n+1 individuals witnessing the Pareto front, this is a surprisingly strong robustness to noise (comparably simple evolutionary algorithms cannot optimize the single-objective OneMax problem in polynomial time when p = omega(log(n)/n)). Our proofs suggest that the strong robustness of the MOEA stems from its implicit diversity mechanism designed to enable it to compute a population covering the whole Pareto front.
Interestingly this result only holds when the objective value of a solution is determined only once and the algorithm from that point on works with this, possibly noisy, objective value. We prove that when all solutions are reevaluated in each iteration, then any noise rate p = omega(log(n)/n^2) leads to a super-polynomial runtime. This is very different from single-objective optimization, where it is generally preferred to reevaluate solutions whenever their fitness is important and where examples are known such that not reevaluating solutions can lead to catastrophic performance losses.
#SV5608
Heuristic-Search Approaches for the Multi-Objective Shortest-Path Problem: Progress and Research Opportunities
In the multi-objective shortest-path problem we are interested in computing a path, or a set of paths that simultaneously balance multiple cost functions. This problem is important for a diverse range of applications such as transporting hazardous materials considering travel distance and risk. This family of problems is not new with results dating back to the 1970’s. Nevertheless, the significant progress made in the field of heuristic search resulted in a new and growing interest in the sub-field of multi-objective search. Consequently, in this paper we review the fundamental problems and techniques common to most algorithms and provide a general overview of the field. We then continue to describe recent work with an emphasis on new challenges that emerged and the resulting research opportunities.
Wednesday 23rd August
10:15-11:15
Early Career 2
#EC1
Large Decision Models
Over recent decades, sequential decision-making tasks are mostly tackled with expert systems and reinforcement learning. However, these methods are still incapable of being generalizable enough to solve new tasks at a low cost. In this article, we discuss a novel paradigm that leverages Transformer-based sequence models to tackle decision-making tasks, named large decision models. Starting from offline reinforcement learning scenarios, early attempts demonstrate that sequential modeling methods can be applied to train an effective policy given sufficient expert trajectories. When the sequence model goes large, its generalization ability over a variety of tasks and fast adaptation to new tasks has been observed, which is highly potential to enable the agent to achieve artificial general intelligence for sequential decision-making in the near future.
#EC9
The Importance of Human-Labeled Data in the Era of LLMs
The advent of large language models (LLMs) has brought about a revolution in the development of tailored machine learning models and sparked debates on redefining data requirements. The automation facilitated by the training and implementation of LLMs has led to discussions and aspirations that human-level labeling interventions may no longer hold the same level of importance as in the era of supervised learning.
This paper presents compelling arguments supporting the ongoing relevance of human-labeled data in the era of LLMs.
#EC11
Artificial Intelligence, Bias, and Ethics
Although ChatGPT attempts to mitigate bias, when instructed to translate the gender-neutral Turkish sentences “O bir doktor. O bir hemşire” to English, the outcome is biased: “He is a doctor. She is a nurse.” In 2016, we have demonstrated that language representations trained via unsupervised learning automatically embed implicit biases documented in social cognition through the statistical regularities in language corpora. Embedding associations in language, vision, and multi-modal language-vision models reveal that large-scale sociocultural data is a source of implicit human biases regarding gender, race or ethnicity, skin color, ability, age, sexuality, religion, social class, and intersectional associations. The study of gender bias in language, vision, language-vision, and generative AI has highlighted the sexualization of women and girls in AI, while easily accessible generative AI models such as text-to-image generators amplify bias at scale. As AI increasingly automates tasks that determine life’s outcomes and opportunities, the ethics of AI bias has significant implications for human cognition, society, justice, and the future of AI. Thus, it is crucial to advance our understanding of the depth and prevalence of bias in AI to mitigate it both in machines and society.
#EC4
A Pathway Towards Responsible AI Generated Content
AI Generated Content (AIGC) has received tremendous attention within the past few years, with content ranging from image, text, to audio, video, etc. Meanwhile, AIGC has become a double-edged sword and recently received much criticism regarding its responsible usage. In this article, we focus on three main concerns that may hinder the healthy development and deployment of AIGC in practice, including risks from privacy; bias, toxicity, misinformation; and intellectual property (IP). By documenting known and potential risks, as well as any possible misuse scenarios of AIGC, the aim is to sound the alarm of potential risks and misuse, help society to eliminate obstacles, and promote the more ethical and secure deployment of AIGC.
Wednesday 23rd August
10:15-11:15
AI for Social Good – NLP
#AI4SG5772
Evaluating GPT-3 Generated Explanations for Hateful Content Moderation
Recent research has focused on using large language models (LLMs) to generate explanations for hate speech through fine-tuning or prompting. Despite the growing interest in this area, these generated explanations’ effectiveness and potential limitations remain poorly understood. A key concern is that these explanations, generated by LLMs, may lead to erroneous judgments about the nature of flagged content by both users and content moderators. For instance, an LLM-generated explanation might inaccurately convince a content moderator that a benign piece of content is hateful. In light of this, we propose an analytical framework for examining hate speech explanations and conducted an extensive survey on evaluating such explanations. Specifically, we prompted GPT-3 to generate explanations for both hateful and non-hateful content, and a survey was conducted with 2,400 unique respondents to evaluate the generated explanations. Our findings reveal that (1) human evaluators rated the GPT-generated explanations as high quality in terms of linguistic fluency, informativeness, persuasiveness, and logical soundness, (2) the persuasive nature of these explanations, however, varied depending on the prompting strategy employed, and (3) this persuasiveness may result in incorrect judgments about the hatefulness of the content. Our study underscores the need for caution in applying LLM-generated explanations for content moderation. Code and results are available at https://github.com/Social-AI-Studio/GPT3-HateEval.
#AI4SG5778
Mimicking the Thinking Process for Emotion Recognition in Conversation with Prompts and Paraphrasing
Emotion recognition in conversation, which aims to predict the emotion for all utterances, has attracted considerable research attention in recent years. It is a challenging task since the recognition of the emotion in one utterance involves many complex factors, such as the conversational context, the speaker’s background, and the subtle difference between emotion labels. In this paper, we propose a novel framework which mimics the thinking process when modeling these factors. Specifically, we first comprehend the conversational context with a history-oriented prompt to selectively gather information from predecessors of the target utterance. We then model the speaker’s background with an experience-oriented prompt to retrieve the similar utterances from all conversations. We finally differentiate the subtle label semantics with a paraphrasing mechanism to elicit the intrinsic label related knowledge.
We conducted extensive experiments on three benchmarks. The empirical results demonstrate the superiority of our proposed framework over the state-of-the-art baselines.
#AI4SG5783
Intensity-Valued Emotions Help Stance Detection of Climate Change Twitter Data
Our study focuses on the United Nations Sustainable Development Goal 13: Climate Action, by identifying public attitudes on Twitter about climate change. Public consent and participation is the key factor in dealing with climate crises. However, discussions about climate change on Twitter are often influenced by the polarised beliefs that shape the discourse and divide it into communities of climate change deniers and believers. In our work, we propose a framework that helps identify different attitudes in tweets about climate change (deny, believe, ambiguous). Previous literature often lacks an efficient architecture or ignores the characteristics of climate-denier tweets. Moreover, the presence of various emotions with different levels of intensity turns out to be relevant for shaping discussions on climate change. Therefore, our paper utilizes emotion recognition and emotion intensity prediction as auxiliary tasks for our main task of stance detection. Our framework injects the words affecting the emotions embedded in the tweet to capture the overall representation of the attitude in terms of the emotions associated with it. The final task-specific and shared feature representations are fused with efficient embedding and attention techniques to detect the correct attitude of the tweet. Extensive experiments on our novel curated dataset, two publicly available climate change datasets (ClimateICWSM-2023 and ClimateStance-2022), and a benchmark dataset for stance detection (SemEval-2016) validate the effectiveness of our approach.
#AI4SG5840
GreenPLM: Cross-Lingual Transfer of Monolingual Pre-Trained Language Models at Almost No Cost
Large pre-trained models have revolutionized natural language processing (NLP) research and applications, but high training costs and limited data resources have prevented their benefits from being shared equally amongst speakers of all the world’s languages. To address issues of cross-linguistic access to such models and reduce energy consumption for sustainability during large-scale model training, this study proposes an effective and energy-efficient framework called GreenPLM that uses bilingual lexicons to directly “translate” pre-trained language models of one language into another at almost no additional cost. We validate this approach in 18 languages’ BERT models and show that this framework is comparable to, if not better than, other heuristics with high training costs. In addition, given lightweight continued pre-training on limited data where available, this framework outperforms the original monolingual language models in six out of seven tested languages with up to 200x less pre-training. Aiming at the Leave No One Behind Principle (LNOB), our approach manages to reduce inequalities between languages and energy consumption greatly. We make our code and models publicly available.
#AI4SG5865
Promoting Gender Equality through Gender-biased Language Analysis in Social Media
Gender bias is a pervasive issue that impacts women’s and marginalized groups’ ability to fully participate in social, economic, and political spheres. This study introduces a novel problem of Gender-biased Language Identification and Extraction (GLIdE) from social media interactions and develops a multi-task deep framework that detects gender-biased content and identifies connected causal phrases from the text using emotional information that is present in the input. The method uses a zero-shot strategy with emotional information and a mechanism to represent gender-stereotyped information as a knowledge graph. In this work, we also introduce the first-of-its-kind Gender-biased Analysis Corpus (GAC) of 12,432 social media posts and improve the best-performing baseline for gender-biased language identification and extraction tasks by margins of 4.88% and 5 ROS points, demonstrating this through empirical evaluation and extensive qualitative analysis. By improving the accuracy of identifying and analyzing gender-biased language, this work can contribute to achieving gender equality and promoting inclusive societies, in line with the United Nations Sustainable Development Goals (UN SDGs) and the Leave No One Behind principle (LNOB). We adhere to the principles of transparency and collaboration in line with the UN SDGs by openly sharing our code and dataset.
#AI4SG5888
Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration
The problem of audio-to-text alignment has seen significant amount of research using complete supervision during training. However, this is typically not in the context of long audio recordings wherein the text being queried does not appear verbatim within the audio file. This work is a collaboration with a non-governmental organization called CARE India that collects long audio health surveys from young mothers residing in rural parts of Bihar, India. Given a question drawn from a questionnaire that is used to guide these surveys, we aim to locate where the question is asked within a long audio recording. This is of great value to African and Asian organizations that would otherwise have to painstakingly go through long and noisy audio recordings to locate questions (and answers) of interest. Our proposed framework, INDENT, uses a cross-attention-based model and prior information on the temporal ordering of sentences to learn speech embeddings that capture the semantics of the underlying spoken text. These learnt embeddings are used to retrieve the corresponding audio segment based on text queries at inference time. We empirically demonstrate the significant effectiveness (improvement in R-avg of about 3%) of our model over those obtained using text-based heuristics. We also show how noisy ASR, generated using state-of-the-art ASR models for Indian languages, yields better results when used in place of speech. INDENT, trained only on Hindi data is able to cater to all languages supported by the (semantically) shared text space. We illustrate this empirically on 11 Indic languages.
Wednesday 23rd August
11:45-12:45
Machine Learning (4/12)
#4401
A Logic-based Approach to Contrastive Explainability for Neurosymbolic Visual Question Answering
Visual Question Answering (VQA) is a well-known problem for which deep-learning is key. This poses a challenge for explaining answers to questions, the more if advanced notions like contrastive explanations (CEs) should be provided. The latter explain why an answer has been reached in contrast to a different one and are attractive as they focus on reasons necessary to flip a query answer. We present a CE framework for VQA that uses a neurosymbolic VQA architecture which disentangles perception from reasoning. Once the reasoning part is provided as logical theory, we use answer-set programming, in which CE generation can be framed as an abduction problem. We validate our approach on the CLEVR dataset, which we extend by more sophisticated questions to further demonstrate the robustness of the modular architecture. While we achieve top performance compared to related approaches, we can also produce CEs for explanation, model debugging, and validation tasks, showing the versatility of the declarative approach to reasoning.
#2144
Stochastic Feature Averaging for Learning with Long-Tailed Noisy Labels
Deep neural networks have shown promising results on a wide variety of tasks using large-scale and well-annotated training datasets. However, data collected from real-world applications can suffer from two prevalent biases, i.e., long-tailed class distribution and label noise. Previous efforts on long-tailed learning and label-noise learning can only address a single type of data bias, leading to a severe deterioration of their performance. In this paper, we propose a distance-based sample selection algorithm called Stochastic Feature Averaging (SFA), which fits a Gaussian using the exponential running average of class centroids to capture uncertainty in representation space due to label noise and data scarcity. With SFA, we detect noisy samples based on their distances to class centroids sampled from this Gaussian distribution. Based on the identified clean samples, we then propose to train an auxiliary balanced classifier to improve the generalization for the minority class and facilitate the update of Gaussian parameters. Extensive experimental results show that SFA can enhance the performance of existing methods on both simulated and real-world datasets. Further, we propose to combine SFA with the sample-selection approach, distribution-robust, and noise-robust loss functions, resulting in significant improvement in performance over the baselines. Our code is available at https://github.com/HotanLee/SFA
#3365
Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents?
Reinforcement Learning’s (RL) ubiquity has instigated research on potential threats to its training and deployment. Many works study single-learner training-time attacks that “pre-programme” behavioral triggers into a strategy. However, attacks on collections of learning agents remain largely overlooked. We remedy the situation by developing a constructive training-time attack on a population of learning agents and additionally make the attack agnostic to the population’s size. The attack constitutes a sequence of environment (re)parameterizations (poisonings), generated to overcome individual differences between agents and lead the entire population to the same target behavior while minimizing effective environment modulation. Our method is demonstrated on populations of independent learners in “ghost” environments (learners do not interact or perceive each other) as well as environments with mutual awareness, with or without individual learning. From the attack perspective, we pursue an ultra-blackbox setting, i.e., the attacker’s training utilizes only across-policy traces of the victim learners for both attack conditioning and evaluation. The resulting uncertainty in population behavior is managed via a novel Wasserstein distance-based Gaussian embedding of behaviors detected within the victim population. To align with prior works on environment poisoning, our experiments are based on a 3D Grid World domain and show: a) feasibility, i.e., despite the uncertainty, the attack forces a population-wide adoption of target behavior; b) efficacy, i.e., the attack is size-agnostic and transferable. Code and Appendices are available at “bit.ly/github-rb-cep”.
#3281
Some General Identification Results for Linear Latent Hierarchical Causal Structure
We study the problem of learning hierarchical causal structure among latent variables from measured variables. While some existing methods are able to recover the latent hierarchical causal structure, they mostly suffer from restricted assumptions, including the tree-structured graph constraint, no “triangle” structure, and non-Gaussian assumptions. In this paper, we relax these restrictions above and consider a more general and challenging scenario where the beyond tree-structured graph, the “triangle” structure, and the arbitrary noise distribution are allowed. We investigate the identifiability of the latent hierarchical causal structure and show that by using second-order statistics, the latent hierarchical structure can be identified up to the Markov equivalence classes over latent variables. Moreover, some directions in the Markov equivalence classes of latent variables can be further identified using partially non-Gaussian data. Based on the theoretical results above, we design an effective algorithm for learning the latent hierarchical causal structure. The experimental results on synthetic data verify the effectiveness of the proposed method.
#4184
Towards Sharp Analysis for Distributed Learning with Random Features
In recent studies, the generalization properties for distributed learning and random features assumed the existence of the target concept over the hypothesis space. However, this strict condition is not applicable to the more common non-attainable case. In this paper, using refined proof techniques, we first extend the optimal rates for distributed learning with random features to the non-attainable case. Then, we reduce the number of required random features via data-dependent generating strategy, and improve the allowed number of partitions with additional unlabeled data. Theoretical analysis shows these techniques remarkably reduce computational cost while preserving the optimal generalization accuracy under standard assumptions. Finally, we conduct several experiments on both simulated and real-world datasets, and the empirical results validate our theoretical findings.
#5195
More for Less: Safe Policy Improvement with Stronger Performance Guarantees
In an offline reinforcement learning setting, the safe policy improvement (SPI) problem aims to improve the performance of a behavior policy according to which sample data has been generated.
State-of-the-art approaches to SPI require a high number of samples to provide practical probabilistic guarantees on the improved policy’s performance.
We present a novel approach to the SPI problem that provides the means to require less data for such guarantees.
Specifically, to prove the correctness of these guarantees, we devise implicit transformations on the data set and the underlying environment model that serve as theoretical foundations to derive tighter improvement bounds for SPI.
Our empirical evaluation, using the well-established SPI with baseline bootstrapping (SPIBB) algorithm, on standard benchmarks shows that our method indeed significantly reduces the sample complexity of the SPIBB algorithm.
Wednesday 23rd August
11:45-12:45
Machine Learning (5/12)
#1238
c-TPE: Tree-structured Parzen Estimator with Inequality Constraints for Expensive Hyperparameter Optimization
Hyperparameter optimization (HPO) is crucial for strong performance of deep learning algorithms and real-world applications often impose some constraints, such as memory usage, or latency on top of the performance requirement. In this work, we propose constrained TPE (c-TPE), an extension of the widely-used versatile Bayesian optimization method, tree-structured Parzen estimator (TPE), to handle these constraints. Our proposed extension goes beyond a simple combination of an existing acquisition function and the original TPE, and instead includes modifications that address issues that cause poor performance. We thoroughly analyze these modifications both empirically and theoretically, providing insights into how they effectively overcome these challenges. In the experiments, we demonstrate that c-TPE exhibits the best average rank performance among existing methods with statistical significance on 81 expensive HPO with inequality constraints. Due to the lack of baselines, we only discuss the applicability of our method to hard-constrained optimization in Appendix D. See https://arxiv.org/abs/2211.14411 for the latest version with Appendix.
#541
On Approximating Total Variation Distance
Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the problem of computing the TV distance of two product distributions over the domain {0,1}^n. In particular, we establish the following results.
1. The problem of exactly computing the TV distance of two product distributions is #P-complete. This is in stark contrast with other distance measures such as KL, Chi-square, and Hellinger which tensorize over the marginals leading to efficient algorithms.
2. There is a fully polynomial-time deterministic approximation scheme (FPTAS) for computing the TV distance of two product distributions P and Q where Q is the uniform distribution. This result is extended to the case where Q has a constant number of distinct marginals. In contrast, we show that when P and Q are Bayes net distributions the relative approximation of their TV distance is NP-hard.
#2758
Learning Survival Distribution with Implicit Survival Function
Survival analysis aims at modeling the relationship between covariates and event occurrence with some untracked (censored) samples. In implementation, existing methods model the survival distribution with strong assumptions or in a discrete time space for likelihood estimation with censorship, which leads to weak generalization. In this paper, we propose Implicit Survival Function (ISF) based on Implicit Neural Representation for survival distribution estimation without strong assumptions, and employ numerical integration to approximate the cumulative distribution function for prediction and optimization. Experimental results show that ISF outperforms the state-of-the-art methods in three public datasets and has robustness to the hyperparameter controlling estimation precision.
#882
Incomplete Multi-view Clustering via Prototype-based Imputation
In this paper, we study how to achieve two characteristics highly-expected by incomplete multi-view clustering (IMvC). Namely, i) instance commonality refers to that within-cluster instances should share a common pattern, and ii) view versatility refers to that cross-view samples should own view-specific patterns. To this end, we design a novel dual-stream model which employs a dual attention layer and a dual contrastive learning loss to learn view-specific prototypes and model the sample-prototype relationship. When the view is missed, our model performs data recovery using the prototypes in the missing view and the sample-prototype relationship inherited from the observed view. Thanks to our dual-stream model, both cluster- and view-specific information could be captured, and thus the instance commonality and view versatility could be preserved to facilitate IMvC. Extensive experiments demonstrate the superiority of our method on five challenging benchmarks compared with 11 approaches. The code could be accessed from https://pengxi.me.
#1748
Deep Partial Multi-Label Learning with Graph Disambiguation
In partial multi-label learning (PML), each data example is equipped with a candidate label set, which consists of multiple ground-truth labels and other false-positive labels. Recently, graph-based methods, which demonstrate a good ability to estimate accurate confidence scores from candidate labels, have been prevalent to deal with PML problems. However, we observe that existing graph-based PML methods typically adopt linear multi-label classifiers and thus fail to achieve superior performance. In this work, we attempt to remove several obstacles for extending them to deep models and propose a novel deep Partial multi-Label model with grAph-disambIguatioN (PLAIN). Specifically, we introduce the instance-level and label-level similarities to recover label confidences as well as exploit label dependencies. At each training epoch, labels are propagated on the instance and label graphs to produce relatively accurate pseudo-labels; then, we train the deep model to fit the numerical labels. Moreover, we provide a careful analysis of the risk functions to guarantee the robustness of the proposed model. Extensive experiments on various synthetic datasets and three real-world PML datasets demonstrate that PLAIN achieves significantly superior results to state-of-the-art methods.
#856
Graph Sampling-based Meta-Learning for Molecular Property Prediction
Molecular property is usually observed with a limited number of samples, and researchers have considered property prediction as a few-shot problem. One important fact that has been ignored by prior works is that each molecule can be recorded with several different properties simultaneously. To effectively utilize many-to-many correlations of molecules and properties, we propose a Graph Sampling-based Meta-learning (GS-Meta) framework for few-shot molecular property prediction. First, we construct a Molecule-Property relation Graph (MPG): molecule and properties are nodes, while property labels decide edges. Then, to utilize the topological information of MPG, we reformulate an episode in meta-learning as a subgraph of the MPG, containing a target property node, molecule nodes, and auxiliary property nodes. Third, as episodes in the form of subgraphs are no longer independent of each other, we propose to schedule the subgraph sampling process with a contrastive loss function, which considers the consistency and discrimination of subgraphs. Extensive experiments on 5 commonly-used benchmarks show GS-Meta consistently outperforms state-of-the-art methods by 5.71%-6.93% in ROC-AUC and verify the effectiveness of each proposed module. Our code is available at https://github.com/HICAI-ZJU/GS-Meta.
Wednesday 23rd August
11:45-12:45
Computer Vision (3/6)
#2321
Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network for Spatial-Temporal Action Localization
The key to video action detection lies in the understanding of interaction between persons and background objects in a video. Current methods usually employ object detectors to extract objects directly or use grid features to represent objects in the environment, which underestimate the great potential of multi-scale context information (e.g., objects and scenes of different sizes). How to exactly represent the multi-scale context and make full utilization of it still remains an unresolved challenge for spatial-temporal action localization. In this paper, we propose a novel Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network (AMCRNet) that extracts multi-scale context through multiple pooling layers with different sizes. Specifically, we develop an Interactive Relation Extraction module to model the higher-order relation between the target person and the context (e.g., other persons and objects). Along this line, we further propose a History Feature Bank and Interaction method to achieve better performance by modeling such relation across continuing video clips. Extensive experimental results on AVA2.2 and UCF101-24 demonstrate the superiority and rationality of our proposed AMCRNet.
#SC6
Translating Images into Maps (Extended Abstract)
We approach instantaneous mapping, converting images to a top-down view of the world, as a translation problem. We show how a novel form of transformer network can be used to map from images and video directly to an overhead map or bird’s-eye-view (BEV) of the world, in a single end-to-end network. We assume a 1-1 correspondence between a vertical scanline in the image, and rays passing through the camera location in an overhead map. This lets us formulate map generation from an image as a set of sequence-to-sequence translations. This constrained formulation, based upon a strong physical grounding of the problem, leads to a restricted transformer network that is convolutional in the horizontal direction only. The structure allows us to make efficient use of data when training, and obtains state-of-the-art results for instantaneous mapping of three large-scale datasets, including a 15\% and 30\% relative gain against existing best performing methods on the nuScenes and Argoverse datasets, respectively.
#639
Discrepancy-Guided Reconstruction Learning for Image Forgery Detection
In this paper, we propose a novel image forgery detection paradigm for boosting the model learning capacity on both forgery-sensitive and genuine compact visual patterns. Compared to the existing methods that only focus on the discrepant-specific patterns (\eg, noises, textures, and frequencies), our method has a greater generalization. Specifically, we first propose a Discrepancy-Guided Encoder (DisGE) to extract forgery-sensitive visual patterns. DisGE consists of two branches, where the mainstream backbone branch is used to extract general semantic features, and the accessorial discrepant external attention branch is used to extract explicit forgery cues. Besides, a Double-Head Reconstruction (DouHR) module is proposed to enhance genuine compact visual patterns in different granular spaces. Under DouHR, we further introduce a Discrepancy-Aggregation Detector (DisAD) to aggregate these genuine compact visual patterns, such that the forgery detection capability on unknown patterns can be improved. Extensive experimental results on four challenging datasets validate the effectiveness of our proposed method against state-of-the-art competitors.
#3357
Spatially Constrained Adversarial Attack Detection and Localization in the Representation Space of Optical Flow Networks
Optical flow estimation have shown significant improvements with advances in deep neural networks. However, these flow networks have recently been shown to be vulnerable to patch-based adversarial attacks, which poses security risks in real-world applications, such as self-driving cars and robotics. We propose SADL, a Spatially constrained adversarial Attack Detection and Localization framework, to detect and localize these patch-based attack without requiring a dedicated training. The detection of an attacked input sequence is performed via iterative optimization on the features from the inner layers of flow networks, without any prior knowledge of the attacks. The novel spatially constrained optimization ensures that the detected anomalous subset of features comes from a local region. To this end, SADL provides a subset of nodes within a spatial neighborhood that contribute more to the detection, which will be utilized to localize the attack in the input sequence. The proposed SADL is validated across multiple datasets and flow networks. With patch attacks 4.8% of the size of the input image resolution on RAFT, our method successfully detects and localizes them with an average precision of 0.946 and 0.951 for KITTI-2015 and MPI-Sintel datasets, respectively. The results show that SADL consistently achieves higher detection rates than existing methods and provides new localization capabilities.
#1542
Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration For Video Captioning
Fine-tuning large vision-language models is a challenging task. Prompt tuning approaches have been introduced to learn fixed textual or visual prompts while freezing the pre-trained model in downstream tasks. Despite the effectiveness of prompt tuning, what do those learnable prompts learn remains unexplained. In this work, we explore whether prompts in the fine-tuning can learn knowledge-aware prompts from the pre-training, by designing two different sets of prompts in pre-training and fine-tuning phases respectively. Specifically, we present a Video-Language Prompt tuning (VL-Prompt) approach for video captioning, which first efficiently pre-train a video-language model to extract key information (e.g., actions and objects) with flexibly generated Knowledge-Aware Prompt (KAP). Then, we design a Video-Language Prompt (VLP) to transfer the knowledge from the knowledge-aware prompts and fine-tune the model to generate full captions. Experimental results show the superior performance of our approach over several state-of-the-art baselines. We further demonstrate that the video-language prompts are well learned from the knowledge-aware prompts.
#4226
Depth-Relative Self Attention for Monocular Depth Estimation
Monocular depth estimation is very challenging because clues to the exact depth are incomplete in a single RGB image. To overcome the limitation, deep neural networks rely on various visual hints such as size, shade, and texture extracted from RGB information. However, we observe that if such hints are overly exploited, the network can be biased on RGB information without considering the comprehensive view. We propose a novel depth estimation model named RElative Depth Transformer (RED-T) that uses relative depth as guidance in self-attention. Specifically, the model assigns high attention weights to pixels of close depth and low attention weights to pixels of distant depth. As a result, the features of similar depth can become more likely to each other and thus less prone to misused visual hints. We show that the proposed model achieves competitive results in monocular depth estimation benchmarks and is less biased to RGB information. In addition, we propose a novel monocular depth estimation benchmark that limits the observable depth range during training in order to evaluate the robustness of the model for unseen depths.
Wednesday 23rd August
11:45-12:45
CV: Segmentation (2/2)
#774
RFENet: Towards Reciprocal Feature Evolution for Glass Segmentation
Glass-like objects are widespread in daily life but remain intractable to be segmented for most existing methods. The transparent property makes it difficult to be distinguished from background, while the tiny separation boundary further impedes the acquisition of their exact contour. In this paper, by revealing the key co-evolution demand of semantic and boundary learning, we propose a Selective Mutual Evolution (SME) module to enable the reciprocal feature learning between them. Then to exploit the global shape context, we propose a Structurally Attentive Refinement (SAR) module to conduct a fine-grained feature refinement for those ambiguous points around the boundary. Finally, to further utilize the multi-scale representation, we integrate the above two modules into a cascaded structure and then introduce a Reciprocal Feature Evolution Network (RFENet) for effective glass-like object segmentation. Extensive experiments demonstrate that our RFENet achieves state-of-the-art performance on three popular public datasets. Code is available at https://github.com/VankouF/RFENet.
#2883
Locate, Refine and Restore: A Progressive Enhancement Network for Camouflaged Object Detection
Camouflaged Object Detection (COD) aims to segment objects that blend in with their surroundings. Most existing methods mainly tackle this issue by a single-stage framework, which tends to degrade performance in the face of small objects, low-contrast objects, and objects with diverse appearances. In this paper, we propose a novel Progressive Enhancement Network (PENet) for COD by imitating the human visual detection system, which follows a three-stage detection process: locate objects, refine textures, and restore boundary. Specifically, our PENet contains three key modules, i.e., the object location module (OLM), the group attention module (GAM), and the context feature restoration module (CFRM). The OLM is designed to position the object globally, the GAM is developed to refine both high-level semantic and low-level texture feature representation, and the CFRM is leveraged to effectively aggregate multi-level features for progressively restoring the clear boundary. Extensive results demonstrate that our PENet significantly outperforms the 32 state-of-the-art methods on four widely used benchmark datasets.
#795
FGNet: Towards Filling the Intra-class and Inter-class Gaps for Few-shot Segmentation
Current few-shot segmentation (FSS) approaches have made tremendous achievements based on prototypical learning techniques. However, due to the scarcity of the support data provided, FSS methods still suffer from the intra-class and inter-class gaps. In this paper, we propose a uniform network to fill both the gaps, termed FGNet. It consists of the novel design of a Self-Adaptive Module (SAM) to emphasize the query feature to generate an enhanced prototype for self-alignment. Such a prototype caters to each query sample itself since it contains the underlying intra-instance information, which gets around the intra-class appearance gap. Moreover, we design an Inter-class Feature Separation Module (IFSM) to separate the feature space of the target class from other classes, which contributes to bridging the inter-class gap. In addition, we present several new losses and a method termed B-SLIC, which help to further enhance the separation performance of FGNet. Experimental results show that FGNet reduces both the gaps for FSS by SAM and IFSM respectively, and achieves state-of-the-art performances on both PASCAL-5i and COCO-20i datasets compared with previous top-performing approaches.
#2727
Hierarchical Semantic Contrast for Weakly Supervised Semantic Segmentation
Weakly supervised semantic segmentation (WSSS) with image-level annotations has achieved great processes through class activation map (CAM). Since vanilla CAMs are hardly served as guidance to bridge the gap between full and weak supervision, recent studies explore semantic representations to make CAM fit for WSSS and demonstrate encouraging results. However, they generally exploit single-level semantics, which may hamper the model to learn a comprehensive semantic structure. Motivated by the prior that each image has multiple levels of semantics, we propose hierarchical semantic contrast (HSC) to ameliorate the above problem. It conducts semantic contrast from coarse-grained to fine-grained perspective, including ROI level, class level, and pixel level, making the model learn a better object pattern understanding. To further improve CAM quality, building upon HSC, we explore consistency regularization of cross supervision and develop momentum prototype learning to utilize abundant semantics across different images. Extensive studies manifest that our plug-and-play learning paradigm, HSC, can significantly boost CAM quality on both non-saliency-guided and saliency-guided baselines, and establish new state-of-the-art WSSS performance on PASCAL VOC 2012 dataset. Code is available at https://github.com/Wu0409/HSC_WSSS.
#2788
Video Object Segmentation in Panoptic Wild Scenes
In this paper, we introduce semi-supervised video object segmentation (VOS) to panoptic wild scenes and present a large-scale benchmark as well as a baseline method for it. Previous benchmarks for VOS with sparse annotations are not sufficient to train or evaluate a model that needs to process all possible objects in real-world scenarios. Our new benchmark (VIPOSeg) contains exhaustive object annotations and covers various real-world object categories which are carefully divided into subsets of thing/stuff and seen/unseen classes for comprehensive evaluation. Considering the challenges in panoptic VOS, we propose a strong baseline method named panoptic object association with transformers (PAOT), which associates multiple objects by panoptic identification in a pyramid architecture on multiple scales. Experimental results show that VIPOSeg can not only boost the performance of VOS models by panoptic training but also evaluate them comprehensively in panoptic scenes. Previous methods for classic VOS still need to improve in performance and efficiency when dealing with panoptic scenes, while our PAOT achieves SOTA performance with good efficiency on VIPOSeg and previous VOS benchmarks. PAOT also ranks 1st in the VOT2022 challenge. Our dataset and code are available at https://github.com/yoxu515/VIPOSeg-Benchmark.
Wednesday 23rd August
11:45-12:45
CV: Recognition (Object Detection, Categorization) (2/3)
#2753
Low-Confidence Samples Mining for Semi-supervised Object Detection
Reliable pseudo labels from unlabeled data play a key role in semi-supervised object detection (SSOD). However, the state-of-the-art SSOD methods all rely on pseudo labels with high confidence, which ignore valuable pseudo labels with lower confidence. Additionally, the insufficient excavation for unlabeled data results in an excessively low recall rate thus hurting the network training. In this paper, we propose a novel Low-confidence Samples Mining (LSM) method to utilize low confidence pseudo labels efficiently. Specifically, we develop an additional pseudo information mining (PIM) branch on account of low-resolution feature maps to extract reliable large area instances, the IoUs of which are higher than small area ones. Owing to the complementary predictions between PIM and the main branch, we further design self-distillation (SD) to compensate for both in a mutually learning manner. Meanwhile, the extensibility of the above approaches enables our LSM to apply to Faster-RCNN and Deformable-DETR respectively. On the MS-COCO benchmark, our method achieves 3.54% mAP improvement over state-of-the-art methods under 5% labeling ratios.
#5096
Dual Relation Knowledge Distillation for Object Detection
Knowledge distillation is an effective method for model compression. However, it is still a challenging topic to apply knowledge distillation to detection tasks. There are two key points resulting in poor distillation performance for detection tasks. One is the serious imbalance between foreground and background features, another one is that small object lacks enough feature representation. To solve the above issues, we propose a new distillation method named dual relation knowledge distillation (DRKD), including pixel-wise relation distillation and instance-wise relation distillation. The pixel-wise relation distillation embeds pixel-wise features in the graph space and applies graph convolution to capture the global pixel relation. By distilling the global pixel relation, the student detector can learn the relation between foreground and background features, and avoid the difficulty of distilling features directly for the feature imbalance issue. Besides, we find that instance-wise relation supplements valuable knowledge beyond independent features for small objects. Thus, the instance-wise relation distillation is designed, which calculates the similarity of different instances to obtain a relation matrix. More importantly, a relation filter module is designed to highlight valuable instance relations. The proposed dual relation knowledge distillation is general and can be easily applied for both one-stage and two-stage detectors. Our method achieves state-of-the-art performance, which improves Faster R-CNN based on ResNet50 from 38.4% to 41.6% mAP and improves RetinaNet based on ResNet50 from 37.4% to 40.3% mAP on COCO 2017.
#952
Sph2Pob: Boosting Object Detection on Spherical Images with Planar Oriented Boxes Methods
Object detection on panoramic/spherical images has been developed rapidly in the past few years, where IoU-calculator is a fundamental part of various detector components, i.e. Label Assignment, Loss and NMS. Due to the low efficiency and non-differentiability of spherical Unbiased IoU, spherical approximate IoU methods have been proposed recently. We find that the key of these approximate methods is to map spherical boxes to planar boxes. However, there exists two problems in these methods: (1) they do not eliminate the influence of panoramic image distortion; (2) they break the original pose between bounding boxes. They lead to the low accuracy of these methods. Taking the two problems into account, we propose a new sphere-plane boxes transform, called Sph2Pob. Based on the Sph2Pob, we propose (1) an differentiable IoU, Sph2Pob-IoU, for spherical boxes with low time-cost and high accuracy and (2) an agent Loss, Sph2Pob-Loss, for spherical detection with high flexibility and expansibility. Extensive experiments verify the effectiveness and generality of our approaches, and Sph2Pob-IoU and Sph2Pob-Loss together boost the performance of spherical detectors. The source code is available at https://github.com/AntXinyuan/sph2pob.
#2358
Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition
Vision model have gained increasing attention due to their simplicity and efficiency in Scene Text Recognition (STR) task. However, due to lacking the perception of linguistic knowledge and information, recent vision models suffer from two problems: (1) the pure vision-based query results in attention drift, which usually causes poor recognition and is summarized as linguistic insensitive drift (LID) problem in this paper. (2) the visual feature is suboptimal for the recognition in some vision-missing cases (e.g. occlusion, etc.). To address these issues, we propose a Linguistic Perception Vision model (LPV), which explores the linguistic capability of vision model for accurate text recognition. To alleviate the LID problem, we introduce a Cascade Position Attention (CPA) mechanism that obtains high-quality and accurate attention maps through step-wise optimization and linguistic information mining. Furthermore, a Global Linguistic Reconstruction Module (GLRM) is proposed to improve the representation of visual features by perceiving the linguistic information in the visual space, which gradually converts visual features into semantically rich ones during the cascade process. Different from previous methods, our method obtains SOTA results while keeping low complexity (92.4% accuracy with only 8.11M parameters). Code is available at https://github.com/CyrilSterling/LPV.
Wednesday 23rd August
11:45-12:45
DM: Recommender Systems
#4371
Curriculum Multi-Level Learning for Imbalanced Live-Stream Recommendation
In large-scale e-commerce live-stream recommendation, streamers are classified into different levels based on their popularity and other metrics for marketing. Several top streamers at the head level occupy a considerable amount of exposure, resulting in an unbalanced data distribution. A unified model for all levels without consideration of imbalance issue can be biased towards head streamers and neglect the conflicts between levels. The lack of inter-level streamer correlations and intra-level streamer characteristics modeling imposes obstacles to estimating the user behaviors. To tackle these challenges, we propose a curriculum multi-level learning framework for imbalanced recommendation. We separate model parameters into shared and level-specific ones to explore the generality among all levels and discrepancy for each level respectively. The level-aware gradient descent and a curriculum sampling scheduler are designed to capture the de-biased commonalities from all levels as the shared parameters. During the specific parameters training, the hardness-aware learning rate and an adaptor are proposed to dynamically balance the training process. Finally, shared and specific parameters are combined to be the final model weights and learned in a cooperative training framework. Extensive experiments on a live-stream production dataset demonstrate the superiority of the proposed framework.
#109
Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning
Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation. In each turn of CRS, there naturally have two decision-making processes with different roles that influence each other: 1) director, which is to select the follow-up option (i.e., ask or recommend) that is more effective for reducing the action space and acquiring user preferences; and 2) actor, which is to accordingly choose primitive actions (i.e., asked attribute or recommended item) to estimate the effectiveness of the director’s option. However, existing methods heavily rely on a unified decision-making module or heuristic rules, while neglecting to distinguish the roles of different decision procedures, as well as the mutual influences between them. To address this, we propose a novel Director-Actor Hierarchical Conversational Recommender (DAHCR), where the director selects the most effective option, followed by the actor accordingly choosing primitive actions that satisfy user preferences. Specifically, we develop a dynamic hypergraph to model user preferences and introduce an intrinsic motivation to train from weak supervision over the director. Finally, to alleviate the bad effect of model bias on the mutual influence between the director and actor, we model the director’s option by sampling from a categorical distribution. Extensive experiments demonstrate that DAHCR outperforms state-of-the-art methods.
#736
Sequential Recommendation with Probabilistic Logical Reasoning
Deep learning and symbolic learning are two frequently employed methods in Sequential Recommendation (SR). Recent neural-symbolic SR models demonstrate their potential to enable SR to be equipped with concurrent perception and cognition capacities. However, neural-symbolic SR remains a challenging problem due to open issues like representing users and items in logical reasoning. In this paper, we combine the Deep Neural Network (DNN) SR models with logical reasoning and propose a general framework named Sequential Recommendation with Probabilistic Logical Reasoning (short for SR-PLR). This framework allows SR-PLR to benefit from both similarity matching and logical reasoning by disentangling feature embedding and logic embedding in the DNN and probabilistic logic network. To better capture the uncertainty and evolution of user tastes, SR-PLR embeds users and items with a probabilistic method and conducts probabilistic logical reasoning on users’ interaction patterns. Then the feature and logic representations learned from the DNN and logic network are concatenated to make the prediction. Finally, experiments on various sequential recommendation models demonstrate the effectiveness of the SR-PLR. Our code is available at https://github.com/Huanhuaneryuan/SR-PLR.
#87
Self-supervised Graph Disentangled Networks for Review-based Recommendation
User review data is considered as auxiliary information to alleviate the data sparsity problem and improve the quality of learned user/item or interaction representations in review-based recommender systems. However, existing methods usually model user-item interactions in a holistic manner and neglect the entanglement of the latent intents behind them, e.g., price, quality, or appearance, resulting in suboptimal representations and reducing interpretability. In this paper, we propose a Self-supervised Graph Disentangled Networks for review-based recommendation (SGDN), to separately model the user-item interactions based on the latent factors through the textual review data. To this end, we first model the distributions of interactions over latent factors from both semantic information in review data and structural information in user-item graph data, forming several factor graphs. Then a factorized message passing mechanism is designed to learn disentangled user/item and interaction representations on the factor graphs. Finally, we set an intent-aware contrastive learning task to alleviate the sparsity issue and encourage disentanglement through dynamically identifying positive and negative samples based on the learned intent distributions. Empirical results over five benchmark datasets validate the superiority of SGDN over the state-of-the-art methods and the interpretability of learned intent factors.
#SV5509
A Survey on User Behavior Modeling in Recommender Systems
User Behavior Modeling (UBM) plays a critical role in user interest learning, which has been extensively used in recommender systems. Crucial interactive patterns between users and items have been exploited, which brings compelling improvements in many recommendation tasks. In this paper, we attempt to provide a thorough survey of this research topic. We start by reviewing the research background of UBM. Then, we provide a systematic taxonomy of existing UBM research works, which can be categorized into four different directions including Conventional UBM, Long-Sequence UBM, Multi-Type UBM, and UBM with Side Information. Within each direction, representative models and their strengths and weaknesses are comprehensively discussed. Besides, we elaborate on the industrial practices of UBM methods with the hope of providing insights into the application value of existing UBM solutions. Finally, we summarize the survey and discuss the future prospects of this field.
#1796
Basket Representation Learning by Hypergraph Convolution on Repeated Items for Next-basket Recommendation
Basket representation plays an important role in the task of next-basket recommendation. However, existing methods generally adopts pooling operations to learn a basket’s representation, from which two critical issues can be identified.
First, they treat a basket as a set of items independent and identically distributed. We find that items occurring in the same basket have much higher correlations than those randomly selected by conducting data analysis on a real dataset.
Second, although some works have recognized the importance of items repeatedly purchased in multiple baskets, they ignore the correlations among the repeated items in a same basket, whose importance is shown by our data analysis. In this paper, we propose a novel Basket Representation Learning (BRL) model by leveraging the correlations among intra-basket items. Specifically, we first connect all the items (in a basket) as a hyperedge, where the correlations among different items can be well exploited by hypergraph convolution operations. Meanwhile, we also connect all the repeated items in the same basket as a hyperedge, whereby their correlations can be further strengthened. We generate a negative (positive) view of the basket by data augmentation on repeated (non-repeated) items, and apply contrastive learning to force more agreements on repeated items. Finally, experimental results on three real datasets show that our approach performs better than eight baselines in ranking accuracy.
Wednesday 23rd August
11:45-12:45
Natural Language Processing (2/4)
#4870
TITAN : Task-oriented Dialogues with Mixed-Initiative Interactions
In multi-domain task-oriented dialogue systems, users proactively propose a series of domain-specific requests that can often be under-or over-specified, sometimes with ambiguous and cross-domain demands. System-sided initiative would be necessary to identify certain situations and appropriately interact with users to resolve them. However, most existing task-oriented dialogue systems fail to consider such mixed-initiative interaction strategies, performing low efficiency and poor collaboration ability in human-computer conversation. In this paper, we construct a multi-domain task-oriented dialogue dataset with mixed-initiative strategies named TITAN from the large-scale dialogue corpus MultiWOZ 2.1. It contains a total of 1,800 human-human conversations where the system can either ask clarification questions actively or provides relevant information to address failure situations and implicit user requests. We report the results of several baseline models on system response generation and dialogue act prediction to assess the performance of SOTA methods on TITAN. These models can capture mixed-initiative dialogue acts, while remaining the deficiency to actively generate implicit requests and accurately provide alternative information, suggesting ample room for improvement in future studies.
#2511
Towards Incremental NER Data Augmentation via Syntactic-aware Insertion Transformer
Named entity recognition (NER) aims to locate and classify named entities in natural language texts. Most existing high-performance NER models employ a supervised paradigm, which requires a large quantity of high-quality annotated data during training. In order to help NER models perform well in few-shot scenarios, data augmentation approaches attempt to build extra data by means of random editing or by using end-to-end generation with PLMs.
However, these methods focus on only the fluency of generated sentences, ignoring the syntactic correlation between the new and raw sentences. Such uncorrelation also brings low diversity and inconsistent labeling of synthetic samples. To fill this gap, we present SAINT (Syntactic-Aware InsertioN Transformer), a hard-constraint controlled text generation model that incorporates syntactic information. The proposed method operates by inserting new tokens between existing entities in a parallel manner. During insertion procedure, new tokens will be added taking both semantic and syntactic factors into account. Hence the resulting sentence can retain the syntactic correctness with respect to the raw data. Experimental results on two benchmark datasets, i.e., Ontonotes and Wikiann, demonstrate the comparable performance of SAINT over the state-of-the-art baselines.
#3260
Beyond Pure Text: Summarizing Financial Reports Based on Both Textual and Tabular Data
Abstractive text summarization is to generate concise summaries that well preserve both salient information and the overall semantic meanings of the given documents. However, real-world documents, e.g., financial reports, generally contain rich data such as charts and tabular data which invalidates most existing text summarization approaches. This paper is thus motivated to propose this novel approach to simultaneously summarize both textual and tabular data. Particularly, we first manually construct a “table+text → summary” dataset. Then, the tabular data is respectively embedded in a row-wise and column-wise manner, and the textual data is encoded at the sentence-level via an employed pre-trained model. We propose a salient detector gate respectively performed between each pair of row/column and sentence embeddings. The highly correlated content is considered as salient information that must be summarized. Extensive experiments have been performed on our constructed dataset and the promising results demonstrate the effectiveness of the proposed approach w.r.t. a number of both automatic and human evaluation criteria.
#3222
Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models
Pruning has been extensively studied in Transformer-based language models to improve efficiency. Typically, we zero (prune) unimportant model weights and train a derived compact model to improve final accuracy. For pruned weights, we treat them as useless and discard them. This usually leads to significant model accuracy degradation. In this paper, we focus on attention head pruning as head attention is a key component of the transformer-based language models and provides interpretable knowledge meaning. We reveal the relationship between pruned attention heads and retained heads and provide a solution to recycle the discarded knowledge from the pruned heads, named peer distillation. We also develop an automatic framework to locate the to-be-pruned attention heads in each layer, freeing the time-consuming human labor in tuning hyperparameters.Experimental results on the General Language Understanding Evaluation (GLUE) benchmark are provided using BERT model. By recycling discarded knowledge from pruned heads, the proposed method maintains model performance across all nine tasks while reducing heads by over 58% on average and outperforms state-of-the-art techniques (e.g., Random, HISP, $L0$ Norm, SMP).
#747
Learning Few-shot Sample-set Operations for Noisy Multi-label Aspect Category Detection
Multi-label Aspect Category Detection (MACD) is essential for aspect-based sentiment analysis, which aims to identify multiple aspect categories in a given sentence. Few-shot MACD is critical due to the scarcity of labeled data. However, MACD is a high-noise task, and existing methods fail to address it with only two or three training samples per class, which limits the application in practice. To solve above issues, we propose a group of Few-shot Sample-set Operations (FSO) to solve noisy MACD in fewer sample scenarios by identifying the semantic contents of samples. Learning interactions among intersection, subtraction, and union networks, the FSO imitates arithmetic operations on samples to distinguish relevant and irrelevant aspect contents. Eliminating the negative effect caused by noises, the FSO extracts discriminative prototypes and customizes a dedicated query vector for each class. Besides, we design a multi-label architecture, which integrates with score-wise loss and multi-label loss to optimize the FSO for multi-label prediction, avoiding complex threshold training or selection. Experiments show that our method achieves considerable performance. Significantly, it improves by 11.01% at most and an average of 8.59% Macro-F in fewer sample scenarios.
#3822
SQuAD-SRC: A Dataset for Multi-Accent Spoken Reading Comprehension
Spoken Reading Comprehension (SRC) is a challenging problem in spoken natural language retrieval, which automatically extracts the answer from the text-form contents according to the audio-form question. However, the existing spoken question answering approaches are mainly based on synthetically generated audio-form data, which may be ineffectively applied for multi-accent spoken question answering directly in many real-world applications. In this paper, we construct a large-scale multi-accent human spoken dataset SQuAD-SRC, in order to study the problem of multi-accent spoken reading comprehension. We choose 24 native English speakers from six different countries with various English accents and construct audio-form questions to the correspondent text-form contents by the chosen speakers. The dataset consists of 98,169 spoken question answering pairs and 20,963 passages from the popular machine reading comprehension dataset SQuAD. We present a statistical analysis of our SQuAD-SRC dataset and conduct extensive experiments on it by comparing cascaded SRC approaches and the enhanced end-to-end ones. Moreover, we explore various adaption strategies to improve the SRC performance, especially for multi-accent spoken questions.
Wednesday 23rd August
11:45-12:45
GTEP: Fair Division (1/2)
#199
Approximate Envy-Freeness in Graphical Cake Cutting
We study the problem of fairly allocating a divisible resource in the form of a graph, also known as graphical cake cutting. Unlike for the canonical interval cake, a connected envy-free allocation is not guaranteed to exist for a graphical cake. We focus on the existence and computation of connected allocations with low envy. For general graphs, we show that there is always a 1/2-additive-envy-free allocation and, if the agents’ valuations are identical, a (2+\epsilon)-multiplicative-envy-free allocation for any \epsilon > 0. In the case of star graphs, we obtain a multiplicative factor of 3+\epsilon for arbitrary valuations and 2 for identical valuations. We also derive guarantees when each agent can receive more than one connected piece. All of our results come with efficient algorithms for computing the respective allocations.
#2774
Approximating Fair Division on D-Claw-Free Graphs
We study the problem of fair allocation of indivisible goods that form a graph and the bundles that are distributed to agents are connected subgraphs of this graph. We focus on the maximin share and the proportional fairness criteria. It is well-known that allocations satisfying these criteria may not exist for many graphs including complete graphs and cycles. Therefore, it is natural to look for approximate allocations, i.e., allocations guaranteeing each agent a certain portion of the value that is satisfactory to her. In this paper we consider the class of graphs of goods which do not contain a star with d+1 edges (where d > 1) as an induced subgraph. For this class of graphs we prove that there is an allocation assigning each agent a connected bundle of value at least 1/d of her maximin share. Moreover, for the same class of graphs of goods, we show a theorem which specifies what fraction of the proportional share can be guaranteed to each agent if the values of single goods for the agents are bounded by a given fraction of this share.
#4482
Fair Division of a Graph into Compact Bundles
We study the computational complexity of fair division of indivisible items in an enriched model: there is an underlying graph on the set of items. And we have to allocate the items (i.e., the vertices of the graph) to a set of agents in such a way that (a) the allocation is fair (for appropriate notions of fairness) and (b) each agent receives a bundle of items (i.e., a subset of vertices) that induces a subgraph with a specific “nice structure.” This model has previously been studied in the literature with the nice structure being a connected subgraph. In this paper, we propose an alternative for connectivity in fair division. We introduce compact graphs, and look for fair allocations in which each agent receives a compact bundle of items. Through compactness, we attempt to capture the idea that every agent must receive a bundle of “closely related” items. We prove a host of hardness and tractability results with respect to fairness concepts such as proportionality, envy-freeness and maximin share guarantee.
#4122
Truthful Fair Mechanisms for Allocating Mixed Divisible and Indivisible Goods
We study the problem of designing truthful and fair mechanisms when allocating a mixture of divisible and indivisible goods. We first show that there does not exist an EFM (envy-free for mixed goods) and truthful mechanism in general. This impossibility result holds even if there is only one indivisible good and one divisible good and there are only two agents. Thus, we focus on some more restricted settings. Under the setting where agents have binary valuations on indivisible goods and identical valuations on a single divisible good (e.g., money), we design an EFM and truthful mechanism. When agents have binary valuations over both divisible and indivisible goods, we first show there exist EFM and truthful mechanisms when there are only two agents or when there is a single divisible good. On the other hand, we show that the mechanism maximizing Nash welfare cannot ensure EFM and truthfulness simultaneously.
#3273
Random Assignment of Indivisible Goods under Constraints
We investigate the problem of random assignment of indivisible goods, in which each agent has an ordinal preference and a constraint. Our goal is to characterize the conditions under which there always exists a random assignment that simultaneously satisfies efficiency and envy-freeness. The probabilistic serial mechanism ensures the existence of such an assignment for the unconstrained setting. In this paper, we consider a more general setting in which each agent can consume a set of items only if the set satisfies her feasibility constraint. Such constraints must be taken into account in student course placements, employee shift assignments, and so on. We demonstrate that an efficient and envy-free assignment may not exist even for the simple case of partition matroid constraints, where the items are categorized, and each agent demands one item from each category. We then identify special cases in which an efficient and envy-free assignment always exists. For these cases, the probabilistic serial cannot be naturally extended; therefore, we provide mechanisms to find the desired assignment using various approaches.
#1021
Fair Division with Two-Sided Preferences
We study a fair division setting in which a number of players are to be fairly distributed among a set of teams. In our model, not only do the teams have preferences over the players as in the canonical fair division setting, but the players also have preferences over the teams. We focus on guaranteeing envy-freeness up to one player (EF1) for the teams together with a stability condition for both sides. We show that an allocation satisfying EF1, swap stability, and individual stability always exists and can be computed in polynomial time, even when teams may have positive or negative values for players. Similarly, a balanced and swap stable allocation that satisfies a relaxation of EF1 can be computed efficiently. When teams have nonnegative values for players, we prove that an EF1 and Pareto optimal allocation exists and, if the valuations are binary, can be found in polynomial time. We also examine the compatibility between EF1 and justified envy-freeness.
Wednesday 23rd August
11:45-12:45
Agent-based and Multi-agent Systems (2/4)
#4383
Efficient and Equitable Deployment of Mobile Vaccine Distribution Centers
Vaccines have proven to be extremely effective in preventing the spread of COVID-19 and potentially ending the pandemic. Lack of access caused many people not getting vaccinated early, so states such as Virginia deployed mobile vaccination sites in order to distribute vaccines across the state. Here we study the problem of deciding where these facilities should be placed and moved over time in order to minimize the distance each person needs to travel in order to be vaccinated. Traditional facility location models for this problem fail to incorporate the fact that our facilities are mobile (i.e., they can move over time). To this end, we instead model vaccine distribution as the Dynamic k-Supplier problem and give the first approximation algorithms for this problem. We then run extensive simulations on real world datasets to show the efficacy of our methods. In particular, we find that natural baselines for Dynamic k-Supplier cannot take advantage of the mobility of the facilities, and perform worse than non-mobile k-Supplier algorithms.
#1220
GPLight: Grouped Multi-agent Reinforcement Learning for Large-scale Traffic Signal Control
The use of multi-agent reinforcement learning (MARL) methods in coordinating traffic lights (CTL) has become increasingly popular, treating each intersection as an agent. However, existing MARL approaches either treat each agent absolutely homogeneous, i.e., same network and parameter for each agent, or treat each agent completely heterogeneous, i.e., different networks and parameters for each agent. This creates a difficult balance between accuracy and complexity, especially in large-scale CTL. To address this challenge, we propose a grouped MARL method named GPLight. We first mine the similarity between agent environment considering both real-time traffic flow and static fine-grained road topology. Then we propose two loss functions to maintain a learnable and dynamic clustering, one that uses mutual information estimation for better stability, and the other that maximizes separability between groups. Finally, GPLight enforces the agents in a group to share the same network and parameters. This approach reduces complexity by promoting cooperation within the same group of agents while reflecting differences between groups to ensure accuracy. To verify the effectiveness of our method, we conduct experiments on both synthetic and real-world datasets, with up to 1,089 intersections. Compared with state-of-the-art methods, experiment results demonstrate the superiority of our proposed method, especially in large-scale CTL.
#174
Scalable Communication for Multi-Agent Reinforcement Learning via Transformer-Based Email Mechanism
Communication can impressively improve cooperation in multi-agent reinforcement learning (MARL), especially for partially-observed tasks. However, existing works either broadcast the messages leading to information redundancy, or learn targeted communication by modeling all the other agents as targets, which is not scalable when the number of agents varies. In this work, to tackle the scalability problem of MARL communication for partially-observed tasks, we propose a novel framework Transformer-based Email Mechanism (TEM). The agents adopt local communication to send messages only to the ones that can be observed without modeling all the agents. Inspired by human cooperation with email forwarding, we design message chains to forward information to cooperate with the agents outside the observation range. We introduce Transformer to encode and decode the message chain to choose the next receiver selectively. Empirically, TEM outperforms the baselines on multiple cooperative MARL benchmarks. When the number of agents varies, TEM maintains superior performance without further training.
#729
The #DNN-Verification Problem: Counting Unsafe Inputs for Deep Neural Networks
Deep Neural Networks are increasingly adopted in critical tasks that require a high level of safety, e.g., autonomous driving.
While state-of-the-art verifiers can be employed to check whether a DNN is unsafe w.r.t. some given property (i.e., whether there is at least one unsafe input configuration), their yes/no output is not informative enough for other purposes, such as shielding, model selection, or training improvements.
In this paper, we introduce the #DNN-Verification problem, which involves counting the number of input configurations of a DNN that result in a violation of a particular safety property. We analyze the complexity of this problem and propose a novel approach that returns the exact count of violations. Due to the #P-completeness of the problem, we also propose a randomized, approximate method that provides a provable probabilistic bound of the correct count while significantly reducing computational requirements.
We present experimental results on a set of safety-critical benchmarks that demonstrate the effectiveness of our approximate method and evaluate the tightness of the bound.
#3002
Scalable Verification of Strategy Logic through Three-Valued Abstraction
The model checking problem for multi-agent systems against Strategy Logic specifications is known to be non-elementary. On this logic several fragments have been defined to tackle this issue but at the expense of expressiveness. In this paper, we propose a three-valued semantics for Strategy Logic upon which we define an abstraction method. We show that the latter semantics is an approximation of the classic two-valued one for Strategy Logic. Furthermore, we extend MCMAS, an open-source model checker for multi-agent specifications, to incorporate our abstraction method and present some promising experimental results.
#J5552
A Computational Model of Ostrom’s Institutional Analysis and Development Framework (Extended Abstract)
Ostrom’s Institutional Analysis and Development (IAD) framework represents a comprehensive theoretical effort to identify and outline the variables that determine the outcome in any social interaction. Taking inspiration from it, we define the Action Situation Language (ASL), a machine-readable logical language to express the components of a multiagent interaction, with a special focus on the rules adopted by the community. The ASL is complemented by a game engine that takes an interaction description as input and automatically grounds its semantics as an Extensive-Form Game (EFG), which can be readily analysed using standard game-theoretical solution concepts. Overall, our model allows a community of agents to perform what-if analysis on a set of rules being considered for adoption, by automatically connecting rule configurations to the outcomes they incentivize.
Wednesday 23rd August
11:45-12:45
Knowledge Representation and Reasoning (3/4)
#4629
REPLACE: A Logical Framework for Combining Collective Entity Resolution and Repairing
This paper considers the problem of querying dirty databases, which may contain both erroneous facts and multiple names for the same entity. While both of these data quality issues have been widely studied in isolation, our contribution is a holistic framework for jointly deduplicating and repairing data. Our REPLACE framework follows a declarative approach, utilizing logical rules to specify under which conditions a pair of entity references can or must be merged and logical constraints to specify consistency requirements. The semantics defines a space of solutions, each consisting of a set of merges to perform and a set of facts to delete, which can be further refined by applying optimality criteria. As there may be multiple optimal solutions, we use classical notions of possible and certain query answers to reason over the alternative solutions, and introduce a novel notion of most informative answer to obtain a more compact presentation of query results. We perform a detailed analysis of the data complexity of the central reasoning tasks of recognizing optimal solutions and (most informative) possible and certain answers, for each of the three notions of optimal solution and for both general and restricted specifications.
#2912
The Parameterized Complexity of Finding Concise Local Explanations
We consider the computational problem of finding a smallest local explanation (anchor) for classifying a given feature vector (example) by a black-box model. After showing that the problem is NP-hard in general, we study various natural restrictions of the problem in terms of problem parameters to see whether these restrictions make the problem fixed-parameter tractable or not. We draw a detailed and systematic complexity landscape for combinations of parameters, including the size of the anchor, the size of the anchor’s coverage, and parameters that capture structural aspects of the problem instance, including rank-width, twin-width, and maximum difference.
#SV5615
Generalizing to Unseen Elements: A Survey on Knowledge Extrapolation for Knowledge Graphs
Knowledge graphs (KGs) have become valuable knowledge resources in various applications, and knowledge graph embedding (KGE) methods have garnered increasing attention in recent years. However, conventional KGE methods still face challenges when it comes to handling unseen entities or relations during model testing. To address this issue, much effort has been devoted to various fields of KGs. In this paper, we use a set of general terminologies to unify these methods and refer to them collectively as Knowledge Extrapolation. We comprehensively summarize these methods, classified by our proposed taxonomy, and describe their interrelationships. Additionally, we introduce benchmarks and provide comparisons of these methods based on aspects that are not captured by the taxonomy. Finally, we suggest potential directions for future research.
#1213
Enhancing Datalog Reasoning with Hypertree Decompositions
Datalog reasoning based on the seminaive evaluation strategy evaluates rules using traditional join plans, which often leads to redundancy and inefficiency in practice, especially when the rules are complex. Hypertree decompositions help identify efficient query plans and reduce similar redundancy in query answering. However, it is unclear how this can be applied to materialisation and incremental reasoning with recursive Datalog programs. Moreover, hypertree decompositions require additional data structures and thus introduce nonnegligible overhead in both runtime and memory consumption. In this paper, we provide algorithms that exploit hypertree decompositions for the materialisation and incremental evaluation of Datalog programs. Furthermore, we combine this approach with standard Datalog reasoning algorithms in a modular fashion so that the overhead caused by the decompositions is reduced. Our empirical evaluation shows that, when the program contains complex rules, the combined approach is usually significantly faster than the baseline approach, sometimes by orders of magnitude.
#SV5487
A Survey on Dataset Distillation: Approaches, Applications and Future Directions
Dataset distillation is attracting more attention in machine learning as training sets continue to grow and the cost of training state-of-the-art models becomes increasingly high. By synthesizing datasets with high information density, dataset distillation offers a range of potential applications, including support for continual learning, neural architecture search, and privacy protection. Despite recent advances, we lack a holistic understanding of the approaches and applications. Our survey aims to bridge this gap by first proposing a taxonomy of dataset distillation, characterizing existing approaches, and then systematically reviewing the data modalities, and related applications. In addition, we summarize the challenges and discuss future directions for this field of research.
Wednesday 23rd August
11:45-12:45
CSO: Constraint Programming
#4051
Solving the Identifying Code Set Problem with Grouped Independent Support
An important problem in network science is finding an optimal placement of sensors in nodes in order to uniquely detect failures in the network. This problem can be modelled as an identifying code set (ICS) problem, introduced by Karpovsky et al. in 1998. The ICS problem aims to find a cover of a set S, such that the elements in the cover define a unique signature for each of the elements of S, and to minimise the cover’s cardinality. In this work, we study a generalised identifying code set (GICS) problem, where a unique signature must be found for each subset of S that has a cardinality of at most k (instead of just each element of S). The concept of an independent support of a Boolean formula was introduced by Chakraborty et al. in 2014 to speed up propositional model counting, by identifying a subset of variables whose truth assignments uniquely define those of the other variables.
In this work, we introduce an extended version of independent support, grouped independent support (GIS), and show how to reduce the GICS problem to the GIS problem. We then propose a new solving method for finding a GICS, based on finding a GIS. We show that the prior state-of-the-art approaches yield integer-linear programming (ILP) models whose sizes grow exponentially with the problem size and k, while our GIS encoding only grows polynomially with the problem size and k. While the ILP approach can solve the GICS problem on networks of at most 494 nodes, the GIS-based method can handle networks of up to 21 363 nodes; a ∼40× improvement. The GIS-based method shows up to a 520× improvement on the ILP-based method in terms of median solving time. For the majority of the instances that can be encoded and solved by both methods, the cardinality of the solution returned by the GIS-based method is less than 10% larger than the cardinality of the solution found by the ILP method.
#526
Constraints First: A New MDD-based Model to Generate Sentences Under Constraints
This paper introduces a new approach to generating strongly constrained texts. We consider standardized sentence generation for the typical application of vision screening. To solve this problem, we formalize it as a discrete combinatorial optimization problem and utilize multivalued decision diagrams (MDD), a well-known data structure to deal with constraints. In our context, one key strength of MDD is to compute an exhaustive set of solutions without performing any search. Once the sentences are obtained, we apply a language model (GPT-2) to keep the best ones. We detail this for English and also for French where the agreement and conjugation rules are known to be more complex. Finally, with the help of GPT-2, we get hundreds of bona-fide candidate sentences. When compared with the few dozen sentences usually available in the well-known vision screening test (MNREAD), this brings a major breakthrough in the field of standardized sentence generation. Also, as it can be easily adapted for other languages, it has the potential to make the MNREAD test even more valuable and usable. More generally, this paper highlights MDD as a convincing alternative for constrained text generation, especially when the constraints are hard to satisfy, but also for many other prospects.
#108
A Regular Matching Constraint for String Variables
Using a regular language as a pattern for string matching is nowadays a common -and sometimes unsafe- operation, provided as a built-in feature by most programming languages. A proper constraint solver over string variables should support most of the operations over regular expressions and related constructs. However, state-of-the-art string solvers natively support only the membership relation of a string variable to a regular language. Here we take a step forward by defining a specialised propagator for the match operation, returning the leftmost position where a pattern can match a given string. Empirical evidences show the effectiveness of our approach, implemented within the constraint programming framework, and tested against state-of-the-art string solvers.
#1635
A Bitwise GAC Algorithm for Alldifferent Constraints
The generalized arc consistency (GAC) algorithm is the prevailing solution for alldifferent constraint problems. The core part of GAC for alldifferent constraints is excavating and enumerating all the strongly connected components (SCCs) of the graph model. This causes a large amount of complex data structures to maintain the node information, leading to a large overhead both in time and memory space. More critically, the complexity of the data structures further precludes the coordination of different optimization schemes for GAC. To solve this problem, the key observation of this paper is that the GAC algorithm only cares whether a node of the graph model is in an SCC or not, rather than which SCCs it belongs to. Based on this observation, we propose AllDiffbit, which employs bitwise data structures and operations to efficiently determine if a node is in an SCC. This greatly reduces the corresponding overhead, and enhances the ability to incorporate existing optimizations to work in a synergistic way. Our experiments show that AllDiffbit outperforms the state-of-the-art GAC algorithms over 60%.
#1830
New Bounds and Constraint Programming Models for the Weighted Vertex Coloring Problem
This paper addresses the weighted vertex coloring problem (WVCP) which is an NP-hard variant of the graph coloring problem with various applications.
Given a vertex-weighted graph, the problem consists of partitioning vertices in independent sets (colors) so as to minimize the sum of the maximum weights of the colors.
We first present an iterative procedure to reduce the size of WVCP instances and prove new upper bounds on the objective value and the number of colors.
Alternative constraint programming models are then introduced which rely on primal and dual encodings of the problem and use symmetry breaking constraints.
A large number of experiments are conducted on benchmark instances.
We analyze the impact of using specific bounds to reduce the search space and speed up the exact resolution of instances.
New optimality proofs are reported for some benchmark instances.
Wednesday 23rd August
11:45-12:45
Early Career 3
#EC3
Algorithmic Motion Planning Meets Minimially-Invasive Robotic Surgery
Robots for minimally-invasive surgery such as steerable needles and concentric-tube robots have the potential to dramatically alter the way common medical procedures are performed. They can decrease patient-recovery time, speed healing and reduce scarring. However, manually controlling such devices is highly un-intuitive and automatic planning methods are in need. For the automation of such medical procedures to be clinically accepted, it is critical from a patient care, safety, and regulatory perspective to certify the correctness and effectiveness of the motion-planning algorithms involved in procedure automation. In this paper, I survey recent and ongoing work where we develop efficient and effective planning capabilities for medical robots that provide provable guarantees on various planner attributes as well as introduce new and exciting research opportunities in the field.
#EC8
AI Planning for Hybrid Systems
When planning the tasks of some physical entities that need to perform actions in the world (e.g., a Robot) it is necessary to take into account quite complex models for ensuring that the plan is actually executable. Indeed the state of these systems evolves according to potentially non-linear dynamics where interdependent discrete and continuous changes happen over the entire course of the task. Systems of this kind are typically compactly represented in planning using languages mixing propositional logic and mathematics. However, these languages are still poorly understood and exploited. What are the difficulties for planning in these settings? How can we build systems that can scale up over realistically sized problems? What are the domains which can benefit from these languages? This short paper shows the main two ingredients that are needed to build a heuristic search planner, outline the main impact that such techniques have on application, and provide some open challenges. These models and relative planners hold the promise to deliver explainable AI solutions that do not rely on large amounts of data.
#EC5
Towards Formal Verification of Neuro-symbolic Multi-agent Systems
This paper outlines some of the key methods we developed towards the formal verification of multi- agent systems, covering both symbolic and connectionist systems. It discusses logic-based methods for the verification of unbounded multi-agent systems (i.e., systems composed of an arbitrary number of homogeneous agents, e.g., robot swarms), optimisation approaches for establishing the robustness of neural network models, and methods for analysing properties of neuro-symbolic multi-agent systems.
#EC7
Counting and Sampling Models in First-Order Logic
First-order model counting (FOMC) is the task of counting models of a first-order logic sentence over a given set of domain elements. Its weighted variant, WFOMC, generalizes FOMC by assigning weights to the models and has many applications in statistical relational learning. More than ten years of research by various authors has led to identification of non-trivial classes of WFOMC problems that can be solved in time polynomial in the number of domain elements. In this paper, we describe recent works on WFOMC and the related problem of weighted first-order model sampling (WFOMS). We also discuss possible applications of WFOMC and WFOMS within statistical relational learning and beyond, e.g., automated solving of problems from enumerative combinatorics and elementary probability theory. Finally, we mention research problems that still need to be tackled in order to make applications of these methods really practical more broadly.
Wednesday 23rd August
11:45-12:45
AI for Social Good Projects – Vision
#AI4SGP5850
Long-term Monitoring of Bird Flocks in the Wild
Monitoring and analysis of wildlife are key to conservation planning and conflict management. The widespread use of camera traps coupled with AI-based analysis tools serves as an excellent example of successful and non-invasive use of technology for design, planning, and evaluation of conservation policies. As opposed to the typical use of camera traps that capture still images or short videos, in this project, we propose to analyze longer term videos monitoring a large flock of birds. This project, which is part of the NSF-TIH Indo-US joint R&D partnership, focuses on solving challenges associated with the analysis of long-term videos captured at feeding grounds and nesting sites, among other such locations that host large flocks of migratory birds. We foresee that the objectives of this project would lead to datasets and benchmarking tools as well as novel algorithms that would be instrumental in developing automated video analysis tools that could in turn help understand individual and social behavior of birds. The first of the key outcomes of this research will include the curation of challenging, real-world datasets for benchmarking various image and video analytics algorithms for tasks such as counting, detection, segmentation, and tracking. Our recent efforts towards this outcome is a curated dataset of 812 high-resolution, point-annotated, images (4K – 32MP) of a flock of Demoiselle cranes (Anthropoides virgo) taken from their feeding site at Khichan, Rajasthan, India. The average number of birds in each image is about 207, with a maximum count of 1500. The benchmark experiments show that state-of-the-art vision techniques struggle with tasks such as segmentation, detection, localization, and density estimation for the proposed dataset. Over the execution of this open science research, we will be scaling this dataset for segmentation and tracking in videos, as well as developing novel techniques for video analytics for wildlife monitoring.
#AI4SGP5863
NutriAI: AI-Powered Child Malnutrition Assessment in Low-Resource Environments
Malnutrition among infants and young children is a pervasive public health concern, particularly in developing countries where resources are limited. Millions of children globally suffer from malnourishment and its complications1. Despite the best efforts of governments and organizations, malnourishment persists and remains a leading cause of morbidity and mortality among children under five. Physical measurements, such as weight, height, middle-upper-arm-circumference (muac), and head circumference are commonly used to assess the nutritional status of children. However, this approach can be resource-intensive and challenging to carry out on a large scale. In this research, we are developing NutriAI, a low-cost solution that leverages
small sample size classification approach to detect malnutrition by analyzing 2D images of the subjects in multiple poses. The proposed solution will not only reduce the workload of health workers but also provide a more efficient means of monitoring the nutritional status of children. On the dataset prepared as part of this research, the baseline results highlight that the modern deep learning approaches can facilitate malnutrition detection via anthropometric indicators in the presence of diversity with respect to age, gender, physical characteristics, and accessories including clothing.
#AI4SGP5881
On AI-Assisted Pneumoconiosis Detection from Chest X-rays
According to theWorld Health Organization, Pneumoconiosis
affects millions of workers globally,
with an estimated 260,000 deaths annually. The
burden of Pneumoconiosis is particularly high in
low-income countries, where occupational safety
standards are often inadequate, and the prevalence
of the disease is increasing rapidly. The reduced
availability of expert medical care in rural areas,
where these diseases are more prevalent, further
adds to the delayed screening and unfavourable outcomes
of the disease. This paper aims to highlight
the urgent need for early screening and detection
of Pneumoconiosis, given its significant impact on
affected individuals, their families, and societies as
a whole. With the help of low-cost machine learning
models, early screening, detection, and prevention
of Pneumoconiosis can help reduce healthcare
costs, particularly in low-income countries. In this
direction, this research focuses on designing AI solutions
for detecting different kinds of Pneumoconiosis
from chest X-ray data. This will contribute
to the Sustainable Development Goal 3 of ensuring
healthy lives and promoting well-being for all at all
ages, and present the framework for data collection
and algorithm for detecting Pneumoconiosis
for early screening. The baseline results show that
the existing algorithms are unable to address this
challenge. Therefore, it is our assertion that this
research will improve state-of-the-art algorithms of
segmentation, semantic segmentation, and classification
not only for this disease but in general medical
image analysis literature.
Wednesday 23rd August
15:30-16:50
ML: Deep reinforcement Learning (1/2)
#4714
BRExIt: On Opponent Modelling in Expert Iteration
Finding a best response policy is a central objective in game theory and multi-agent learning, with modern population-based training approaches employing reinforcement learning algorithms as best-response oracles to improve play against candidate opponents (typically previously learnt policies). We propose Best Response Expert Iteration (BRExIt), which accelerates learning in games by incorporating opponent models into the state-of-the-art learning algorithm Expert Iteration (ExIt). BRExIt aims to (1) improve feature shaping in the apprentice, with a policy head predicting opponent policies as an auxiliary task, and (2) bias opponent moves in planning towards the given or learnt opponent model, to generate apprentice targets that better approximate a best response. In an empirical ablation on BRExIt’s algorithmic variants against a set of fixed test agents, we provide statistical evidence that BRExIt learns better performing policies than ExIt. Code available at: https://github.com/Danielhp95/on-opponent-modelling-in-expert-iteration-code. Supplementary material available
at https://arxiv.org/abs/2206.00113.
#SV5653
A Unified View of Deep Learning for Reaction and Retrosynthesis Prediction: Current Status and Future Challenges
Reaction and retrosynthesis prediction are two fundamental tasks in computational chemistry. In recent years, these two tasks have attracted great attentions from both machine learning and drug discovery communities. Various deep learning approaches have been proposed to tackle these two problems and achieved initial success. In this survey, we conduct a comprehensive investigation on advanced deep learning-based reaction and retrosynthesis prediction models. We first summarize the design mechanism, strengths and weaknesses of the state-of-the-art approaches. Then we further discuss limitations of current solutions and open challenges in the problem itself. Last but not the least, we present some promising directions to facilitate future research. To our best knowledge, this paper is the first comprehensive and systematic survey on unified understanding of reaction and retrosynthesis prediction.
#2873
CROP: Towards Distributional-Shift Robust Reinforcement Learning Using Compact Reshaped Observation Processing
The safe application of reinforcement learning (RL) requires generalization from limited training data to unseen scenarios. Yet, fulfilling tasks under changing circumstances is a key challenge in RL. Current state-of-the-art approaches for generalization apply data augmentation techniques to increase the diversity of training data. Even though this prevents overfitting to the training environment(s), it hinders policy optimization. Crafting a suitable observation, only containing crucial information, has been shown to be a challenging task itself. To improve data efficiency and generalization capabilities, we propose Compact Reshaped Observation Processing (CROP) to reduce the state information used for policy optimization. By providing only relevant information, overfitting to a specific training layout is precluded and generalization to unseen environments is improved. We formulate three CROPs that can be applied to fully observable observation- and action-spaces and provide methodical foundation. We empirically show the improvements of CROP in a distributionally shifted safety gridworld. We furthermore provide benchmark comparisons to full observability and data-augmentation in two different-sized procedurally generated mazes.
#1022
Towards Long-delayed Sparsity: Learning a Better Transformer through Reward Redistribution
Recently, Decision Transformer (DT) pioneered the offline RL into a contextual conditional sequence modeling paradigm, which leverages self-attended autoregression to learn from global target rewards, states, and actions. However, many applications have a severe delay of the above signals, such as the agent can only obtain a reward signal at the end of each trajectory. This delay causes an unwanted bias cumulating in autoregressive learning global signals. In this paper, we focused its virtual example on episodic reinforcement learning with trajectory feedback. We propose a new reward redistribution algorithm for learning parameterized reward functions, and it decomposes the long-delayed reward onto each timestep. To improve the redistributing’s adaptation ability, we formulate the previous decomposition as a bi-level optimization problem for global optimal. We extensively evaluate the proposed method on various benchmarks and demonstrate an overwhelming performance improvement under long-delayed settings.
#5171
MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning
Recent approaches have utilized self-supervised auxiliary tasks as representation learning to improve the performance and sample efficiency of vision-based reinforcement learning algorithms in single-agent settings. However, in multi-agent reinforcement learning (MARL), these techniques face challenges because each agent only receives partial observation from an environment influenced by others, resulting in correlated observations in the agent dimension. So it is necessary to consider agent-level information in representation learning for MARL. In this paper, we propose an effective framework called Multi-Agent Masked Attentive Contrastive Learning (MA2CL), which encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space. Specifically, we use an attention reconstruction model for recovering and the model is trained via contrastive learning. MA2CL allows better utilization of contextual information at the agent level, facilitating the training of MARL agents for cooperation tasks. Extensive experiments demonstrate that our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
#2590
DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning
Communication lays the foundation for cooperation in human society and in multi-agent reinforcement learning (MARL). Humans also desire to maintain their privacy when communicating with others, yet such privacy concern has not been considered in existing works in MARL. We propose the differentially private multi-agent communication (DPMAC) algorithm, which protects the sensitive information of individual agents by equipping each agent with a local message sender with rigorous (epsilon, delta)-differential privacy (DP) guarantee. In contrast to directly perturbing the messages with predefined DP noise as commonly done in privacy-preserving scenarios, we adopt a stochastic message sender for each agent respectively and incorporate the DP requirement into the sender, which automatically adjusts the learned message distribution to alleviate the instability caused by DP noise. Further, we prove the existence of a Nash equilibrium in cooperative MARL with privacy-preserving communication, which suggests that this problem is game-theoretically learnable. Extensive experiments demonstrate a clear advantage of DPMAC over baseline methods in privacy-preserving scenarios.
#3093
Spotlight News Driven Quantitative Trading Based on Trajectory Optimization
News-driven quantitative trading (NQT) has been popularly studied in recent years. Most existing NQT methods are performed in a two-step paradigm, i.e., first analyzing markets by a financial prediction task and then making trading decisions, which is doomed to failure due to the nearly futile financial prediction task. To bypass the financial prediction task, in this paper, we focus on reinforcement learning (RL) based NQT paradigm, which leverages news to make profitable trading decisions directly. In this paper, we propose a novel NQT framework SpotlightTrader based on decision trajectory optimization, which can effectively stitch together a continuous and flexible sequence of trading decisions to maximize profits. In addition, we enhance this framework by constructing a spotlight-driven state trajectory that obeys a stochastic process with irregular abrupt jumps caused by spotlight news. Furthermore, in order to adapt to non-stationary financial markets, we propose an effective training pipeline for this framework, which blends offline pretraining with online finetuning to balance exploration and exploitation effectively during online tradings. Extensive experiments on three real-world datasets demonstrate our proposed model’s superiority over the state-of-the-art NQT methods.
#SC4
On the Versatile Uses of Partial Distance Correlation in Deep Learning
Show Abstract
Hide Abstract
Wednesday 23rd August
15:30-16:50
Machine Learning (6/12)
#1239
Speeding Up Multi-Objective Hyperparameter Optimization by Task Similarity-Based Meta-Learning for the Tree-Structured Parzen Estimator
Hyperparameter optimization (HPO) is a vital step in improving performance in deep learning (DL). Practitioners are often faced with the trade-off between multiple criteria, such as accuracy and latency. Given the high computational needs of DL and the growing demand for efficient HPO, the acceleration of multi-objective (MO) optimization becomes ever more important. Despite the significant body of work on meta-learning for HPO, existing methods are inapplicable to MO tree-structured Parzen estimator (MO-TPE), a simple yet powerful MO-HPO algorithm. In this paper, we extend TPE’s acquisition function to the meta-learning setting using a task similarity defined by the overlap of top domains between tasks. We also theoretically analyze and address the limitations of our task similarity. In the experiments, we demonstrate that our method speeds up MO-TPE on tabular HPO benchmarks and attains state-of-the-art performance. Our method was also validated externally by winning the AutoML 2022 competition on “Multiobjective Hyperparameter Optimization for Transformers”. See https://arxiv.org/abs/2212.06751 for the latest version with Appendix.
#434
Unreliable Partial Label Learning with Recursive Separation
Partial label learning (PLL) is a typical weakly supervised learning problem in which each instance is associated with a candidate label set, and among which only one is true. However, the assumption that the ground-truth label is always among the candidate label set would be unrealistic, as the reliability of the candidate label sets in real-world applications cannot be guaranteed by annotators. Therefore, a generalized PLL named Unreliable Partial Label Learning (UPLL) is proposed, in which the true label may not be in the candidate label set. Due to the challenges posed by unreliable labeling, previous PLL methods will experience a marked decline in performance when applied to UPLL. To address the issue, we propose a two-stage framework named Unreliable Partial Label Learning with Recursive Separation (UPLLRS). In the first stage, the self-adaptive recursive separation strategy is proposed to separate the training set into a reliable subset and an unreliable subset. In the second stage, a disambiguation strategy is employed to progressively identify the ground-truth labels in the reliable subset. Simultaneously, semi-supervised learning methods are adopted to extract valuable information from the unreliable subset. Our method demonstrates state-of-the-art performance as evidenced by experimental results, particularly in situations of high unreliability. Code and supplementary materials are available at https://github.com/dhiyu/UPLLRS.
#1241
PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces
The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this issue, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form calculation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient. See https://arxiv.org/abs/2304.10255 for the latest version with Appendix.
#1889
One Model, Any CSP: Graph Neural Networks as Fast Global Search Heuristics for Constraint Satisfaction
We propose a universal Graph Neural Network architecture which can be trained as an end-2-end search heuristic for any Constraint Satisfaction Problem (CSP). Our architecture can be trained unsupervised with policy gradient descent to generate problem specific heuristics for any CSP in a purely data driven manner.
The approach is based on a novel graph representation for CSPs that is both generic and compact and enables us to process every possible CSP instance with one GNN, regardless of constraint arity, relations or domain size. Unlike previous RL-based methods, we operate on a global search action space and allow our GNN to modify any number of variables in every step of the stochastic search. This enables our method to properly leverage the inherent parallelism of GNNs.
We perform a thorough empirical evaluation where we learn heuristics for well known and important CSPs, both decision and optimisation problems, from random data, including graph coloring, MAXCUT, and MAX-k-SAT, and the general RB model. Our approach significantly outperforms prior end-2-end approaches for neural combinatorial optimization. It can compete with conventional heuristics and solvers on test instances that are several orders of magnitude larger and structurally more complex than those seen during training.
#2907
Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits
We study the Improving Multi-Armed Bandit problem, where the reward obtained from an arm increases with the number of pulls it receives. This model provides an elegant abstraction for many real-world problems in domains such as education and employment, where decisions about the distribution of opportunities can affect the future capabilities of communities and the disparity between them. A decision-maker in such settings must consider the impact of her decisions on future rewards in addition to the standard objective of maximizing her cumulative reward at any time. We study the tension between two seemingly conflicting objectives in the horizon-unaware setting: a) maximizing the cumulative reward at any time and b) ensuring that arms with better long-term rewards get sufficient pulls even if they initially have low rewards. We show that, surprisingly, the two objectives are aligned with each other. Our main contribution is an anytime algorithm for the IMAB problem that achieves the best possible cumulative reward while ensuring that the arms reach their true potential given sufficient time. Our algorithm mitigates the initial disparity due to lack of opportunity and continues pulling an arm until it stops improving. We prove the optimality of our algorithm by showing that a) any algorithm for the IMAB problem, no matter how utilitarian, must suffer Omega(T) policy regret and Omega(k) competitive ratio with respect to the optimal offline policy, and b) the competitive ratio of our algorithm is O(k).
#663
Multi-level Graph Contrastive Prototypical Clustering
Recently, graph neural networks (GNNs) have drawn a surge of investigations in deep graph clustering. Nevertheless, existing approaches predominantly are inclined to semantic-agnostic since GNNs exhibit inherent limitations in capturing global underlying semantic structures. Meanwhile, multiple objectives are imposed within one latent space, whereas representations from different granularities may presumably conflict with each other, yielding severe performance degradation for clustering. To this end, we propose a novel Multi-Level Graph Contrastive Prototypical Clustering (MLG-CPC) framework for end-to-end clustering. Specifically, a Prototype Discrimination (ProDisc) objective function is proposed to explicitly capture semantic information via cluster assignments. Moreover, to alleviate the issue of objectives conflict, we introduce to perceive representations of different granularities within individual feature-, prototypical-, and cluster-level spaces by the feature decorrelation, prototype contrast, and cluster space consistency respectively. Extensive experiments on four benchmarks demonstrate the superiority of the proposed MLG-CPC against the state-of-the-art graph clustering approaches.
#SV5647
Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future Prospects
Audio has become an increasingly crucial biometric modality due to its ability to provide an intuitive way for humans to interact with machines. It is currently being used for a range of applications including person authentication to banking to virtual assistants. Research has shown that these systems are also susceptible to spoofing and attacks. Therefore, protecting audio processing systems against fraudulent activities such as identity theft, financial fraud, and spreading misinformation, is of paramount importance. This paper reviews the current state-of-the-art techniques for detecting audio spoofing and discusses the current challenges along with open research problems. The paper further highlights the importance of considering the ethical and privacy implications of audio spoofing detection systems. Lastly, the work aims to accentuate the need for building more robust and generalizable methods, the integration of automatic speaker verification and countermeasure systems, and better evaluation protocols.
#5051
Bidirectional Dilation Transformer for Multispectral and Hyperspectral Image Fusion
Transformer-based methods have proven to be effective in achieving long-distance modeling, capturing the spatial and spectral information, and exhibiting strong inductive bias in various computer vision tasks. Generally, the Transformer model includes two common modes of multi-head self-attention (MSA): spatial MSA (Spa-MSA) and spectral MSA (Spe-MSA). However, Spa-MSA is computationally efficient but limits the global spatial response within a local window. On the other hand, Spe-MSA can calculate channel self-attention to accommodate high-resolution images, but it disregards the crucial local information that is essential for low-level vision tasks. In this study, we propose a bidirectional dilation Transformer (BDT) for multispectral and hyperspectral image fusion (MHIF), which aims to leverage the advantages of both MSA and the latent multiscale information specific to MHIF tasks. The BDT consists of two designed modules: the dilation Spa-MSA (D-Spa), which dynamically expands the spatial receptive field through a given hollow strategy, and the grouped Spe-MSA (G-Spe), which extracts latent features within the feature map and learns local data behavior. Additionally, to fully exploit the multiscale information from both inputs with different spatial resolutions, we employ a bidirectional hierarchy strategy in the BDT, resulting in improved performance. Finally, extensive experiments on two commonly used datasets, CAVE and Harvard, demonstrate the superiority of BDT both visually and quantitatively. Furthermore, the related code will be available at the GitHub page of the authors.
Wednesday 23rd August
15:30-16:50
ML: Classification
#1758
Progressive Label Propagation for Semi-Supervised Multi-Dimensional Classification
In multi-dimensional classification (MDC), each training example is associated with multiple class variables from different class spaces. However, it is rather costly to collect labeled MDC examples which have to be annotated from several dimensions (class spaces). To reduce the labeling cost, we attempt to deal with the MDC problem under the semi-supervised learning setting. Accordingly, a novel MDC approach named PLAP is proposed to solve the resulting semi-supervised MDC problem. Overall, PLAP works under the label propagation framework to utilize unlabeled data. To further consider dependencies among class spaces, PLAP deals with each class space in a progressive manner, where the previous propagation results will be used to initialize the current propagation procedure and all processed class spaces and the current one will be regarded as an entirety. Experiments validate the effectiveness of the proposed approach.
#2362
G2Pxy: Generative Open-Set Node Classification on Graphs with Proxy Unknowns
Node classification is the task of predicting the labels of unlabeled nodes in a graph. State-of-the-art methods based on graph neural networks achieve excellent performance when all labels are available
during training. But in real-life, models are of ten applied on data with new classes, which can lead to massive misclassification and thus significantly degrade performance. Hence, developing
open-set classification methods is crucial to determine if a given sample belongs to a known class. Existing methods for open-set node classification generally use transductive learning with part or all
of the features of real unseen class nodes to help with open-set classification. In this paper, we propose a novel generative open-set node classification method, i.e., G2Pxy, which follows a stricter inductive learning setting where no information about unknown classes is available during training and validation. Two kinds of proxy unknown nodes, inter-class unknown proxies and external unknown proxies are generated via mixup to efficiently anticipate the distribution of novel classes. Using the generated proxies, a closed-set classifier can be transformed into an open-set one, by augmenting it with an extra proxy classifier. Under the constraints
of both cross entropy loss and complement entropy loss, G2Pxy achieves superior effectiveness for unknown class detection and known class classification, which is validated by experiments on bench
mark graph datasets. Moreover, G2Pxy does not have specific requirement on the GNN architecture and shows good generalizations.
#4025
SSML-QNet: Scale-Separative Metric Learning Quadruplet Network for Multi-modal Image Patch Matching
Multi-modal image matching is very challenging due to the significant diversities in visual appearance of different modal images. Typically, the existing well-performed methods mainly focus on learning invariant and discriminative features for measuring the relation between multi-modal image pairs. However, these methods often take the features as a whole and largely overlook the fact that different scale features for a same image pair may have different similarity, which may lead to sub-optimal results only. In this work, we propose a Scale-Separative Metric Learning Quadruplet network (SSML-QNet) for multi-modal image patch matching. Specifically, SSML-QNet can extract both relevant and irrelevant features of imaging modality with the proposed quadruplet network architecture. Then, the proposed Scale-Separative Metric Learning module separately encodes the similarity of different scale features with the pyramid structure. And for each scale, cross-modal consistent features are extracted and measured by coordinate and channel-wise attention sequentially. This makes our network robust to appearance divergence caused by different imaging mechanism. Experiments on the benchmark dataset (VIS-NIR, VIS-LWIR, Optical-SAR, and Brown) have verified that the proposed SSML-QNet is able to outperform other state-of-the-art methods. Furthermore, the cross-dataset transferring experiments on these four datasets also have shown that the proposed method has powerful ability of cross-dataset transferring.
#1927
CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization
Ultra-fine-grained visual classification (ultra-FGVC) targets at classifying sub-grained categories of fine-grained objects. This inevitably requires discriminative representation learning within a limited training set. Exploring intrinsic features from the object itself, e.g., predicting the rotation of a given image, has demonstrated great progress towards learning discriminative representation. Yet none of these works consider explicit supervision for learning mutual information at instance level. To this end, this paper introduces CLE-ViT, a novel contrastive learning encoded transformer, to address the fundamental problem in ultra-FGVC. The core design is a self-supervised module that performs self-shuffling and masking and then distinguishes these altered images from other images. This drives the model to learn an optimized feature space that has a large inter-class distance while remaining tolerant to intra-class variations. By incorporating this self-supervised module, the network acquires more knowledge from the intrinsic structure of the input data, which improves the generalization ability without requiring extra manual annotations. CLE-ViT demonstrates strong performance on 7 publicly available datasets, demonstrating its effectiveness in the ultra-FGVC task. The code is available at https://github.com/Markin-Wang/CLEViT.
#2106
Handling Learnwares Developed from Heterogeneous Feature Spaces without Auxiliary Data
The learnware paradigm proposed by Zhou [2016] devotes to constructing a market of numerous well-performed models, enabling users to solve problems by reusing existing efforts rather than starting from scratch. A learnware comprises a trained model and the specification which enables the model to be adequately identified according to the user’s requirement. Previous studies concentrated on the homogeneous case where models share the same feature space based on Reduced Kernel Mean Embedding (RKME) specification. However, in real-world scenarios, models are typically constructed from different feature spaces. If such a scenario can be handled by the market, all models built for a particular task even with different feature spaces can be identified and reused for a new user task. Generally, this problem would be easier if there were additional auxiliary data connecting different feature spaces, however, obtaining such data in reality is challenging. In this paper, we present a general framework for accommodating heterogeneous learnwares without requiring additional auxiliary data. The key idea is to utilize the submitted RKME specifications to establish the relationship between different feature spaces. Additionally, we give a matrix factorization-based implementation and propose the overall procedure for constructing and exploiting the heterogeneous learnware market. Experiments on real-world tasks validate the efficacy of our method.
#375
Spike Count Maximization for Neuromorphic Vision Recognition
Spiking Neural Networks (SNNs) are the promising models of neuromorphic vision recognition. The mean square error (MSE) and cross-entropy (CE) losses are widely applied to supervise the training of SNNs on neuromorphic datasets. However, the relevance between the output spike counts and predictions is not well modeled by the existing loss functions. This paper proposes a Spike Count Maximization (SCM) training approach for the SNN-based neuromorphic vision recognition model based on optimizing the output spike counts. The SCM is achieved by structural risk minimization (SRM) and a specially designed spike counting loss. The spike counting loss counts the output spikes of the SNN by using the L0-norm, and the SRM maximizes the distance between the margin boundaries of the classifier to ensure the generalization of the model. The SCM is non-smooth and non-differentiable, and we design a two-stage algorithm with fast convergence to solve the problem. Experiment results demonstrate that the SCM performs satisfactorily in most cases. Using the output spikes for prediction, the accuracies of SCM are 2.12%~16.50% higher than the popular training losses on the CIFAR10-DVS dataset. The code is available at https://github.com/TJXTT/SCM-SNN.
#4407
Scalable Optimal Margin Distribution Machine
Optimal margin Distribution Machine (ODM) is a newly proposed statistical learning framework rooting in the novel margin theory, which demonstrates better generalization performance than the traditional large margin based counterparts. Nonetheless, it suffers from the ubiquitous scalability problem regarding both computation time and memory as other kernel methods. This paper proposes a scalable ODM, which can achieve nearly ten times speedup compared to the original ODM training method. For nonlinear kernels, we propose a novel distribution-aware partition method to make the local ODM trained on each partition be close and converge faster to the global one. When linear kernel is applied, we extend a communication efficient SVRG method to accelerate the training further. Extensive empirical studies validate that our proposed method is highly computational efficient and almost never worsen the generalization.
Wednesday 23rd August
15:30-16:50
CV: Computational Photography
#920
On Efficient Transformer-Based Image Pre-training for Low-Level Vision
Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems. In this paper, we tailor transformer-based pre-training regimes that boost various low-level tasks. To comprehensively diagnose the influence of pre-training, we design a whole set of principled evaluation tools that uncover its effects on internal representations. The observations demonstrate that pre-training plays strikingly different roles in low-level tasks. For example, pre-training introduces more local information to intermediate layers in super-resolution (SR), yielding significant performance gains, while pre-training hardly affects internal feature representations in denoising, resulting in limited gains. Further, we explore different methods of pre-training, revealing that multi-related-task pre-training is more effective and data-efficient than other alternatives. Finally, we extend our study to varying data scales and model sizes, as well as comparisons between transformers and CNNs. Based on the study, we successfully develop state-of-the-art models for multiple low-level tasks.
#396
A Large-Scale Film Style Dataset for Learning Multi-frequency Driven Film Enhancement
Film, a classic image style, is culturally significant to the whole photographic industry since it marks the birth of photography. However, film photography is time-consuming and expensive, necessitating a more efficient method for collecting film-style photographs. Numerous datasets that have emerged in the field of image enhancement so far are not film-specific. In order to facilitate film-based image stylization research, we construct FilmSet, a large-scale and high-quality film style dataset. Our dataset includes three different film types and more than 5000 in-the-wild high resolution images. Inspired by the features of FilmSet images, we propose a novel framework called FilmNet based on Laplacian Pyramid for stylizing images across frequency bands and achieving film style outcomes. Experiments reveal that the performance of our model is superior than state-of-the-art techniques. The link of our dataset and code is https://github.com/CXH-Research/FilmNet.
#2815
Pyramid Diffusion Models for Low-light Image Enhancement
Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement. Recent diffusion models show realistic and detailed image generation through a sequence of denoising refinements and motivate us to introduce them to low-light image enhancement for recovering realistic details. However, we found two problems when doing this, i.e., 1) diffusion models keep constant resolution in one reverse process, which limits the speed; 2) diffusion models sometimes result in global degradation (e.g., RGB shift). To address the above problems, this paper proposes a Pyramid Diffusion model (PyDiff) for low-light image enhancement. PyDiff uses a novel pyramid diffusion method to perform sampling in a pyramid resolution style (i.e., progressively increasing resolution in one reverse process). Pyramid diffusion makes PyDiff much faster than vanilla diffusion models and introduces no performance degradation. Furthermore, PyDiff uses a global corrector to alleviate the global degradation that may occur in the reverse process, significantly improving the performance and making the training of diffusion models easier with little additional computational consumption. Extensive experiments on popular benchmarks show that PyDiff achieves superior performance and efficiency. Moreover, PyDiff can generalize well to unseen noise and illumination distributions. Code and supplementary materials are available at https://github.com/limuloo/PyDIff.git.
#2614
SS-BSN: Attentive Blind-Spot Network for Self-Supervised Denoising with Nonlocal Self-Similarity
Recently, numerous studies have been conducted on supervised learning-based image denoising methods. However, these methods rely on large-scale noisy-clean image pairs, which are difficult to obtain in practice. Denoising methods with self-supervised training that can be trained with only noisy images have been proposed to address the limitation. These methods are based on the convolutional neural network (CNN) and have shown promising performance. However, CNN-based methods do not consider using nonlocal self-similarities essential in the traditional method, which can cause performance limitations. This paper presents self-similarity attention (SS-Attention), a novel self-attention module that can capture nonlocal self-similarities to solve the problem. We focus on designing a lightweight self-attention module in a pixel-wise manner, which is nearly impossible to implement using the classic self-attention module due to the quadratically increasing complexity with spatial resolution. Furthermore, we integrate SS-Attention into the blind-spot network called self-similarity-based blind-spot network (SS-BSN). We conduct the experiments on real-world image denoising tasks. The proposed method quantitatively and qualitatively outperforms state-of-the-art methods in self-supervised denoising on the Smartphone Image Denoising Dataset (SIDD) and Darmstadt Noise Dataset (DND) benchmark datasets.
#924
STS-GAN: Can We Synthesize Solid Texture with High Fidelity from Arbitrary 2D Exemplar?
Solid texture synthesis (STS), an effective way to extend a 2D exemplar to a 3D solid volume, exhibits advantages in computational photography. However, existing methods generally fail to accurately learn arbitrary textures, which may result in the failure to synthesize solid textures with high fidelity. In this paper, we propose a novel generative adversarial nets-based framework (STS-GAN) to extend the given 2D exemplar to arbitrary 3D solid textures. In STS-GAN, multi-scale 2D texture discriminators evaluate the similarity between the given 2D exemplar and slices from the generated 3D texture, promoting the 3D texture generator synthesizing realistic solid textures. Finally, experiments demonstrate that the proposed method can generate high-fidelity solid textures with similar visual characteristics to the 2D exemplar.
#2094
Video Frame Interpolation with Densely Queried Bilateral Correlation
Video Frame Interpolation (VFI) aims to synthesize non-existent intermediate frames between existent frames. Flow-based VFI algorithms estimate intermediate motion fields to warp the existent frames. Real-world motions’ complexity and the reference frame’s absence make motion estimation challenging. Many state-of-the-art approaches explicitly model the correlations between two neighboring frames for more accurate motion estimation. In common approaches, the receptive field of correlation modeling at higher resolution depends on the motion fields estimated beforehand. Such receptive field dependency makes common motion estimation approaches poor at coping with small and fast-moving objects. To better model correlations and to produce more accurate motion fields, we propose the Densely Queried Bilateral Correlation (DQBC) that gets rid of the receptive field dependency problem and thus is more friendly to small and fast-moving objects. The motion fields generated with the help of DQBC are further refined and up-sampled with context features. After the motion fields are fixed, a CNN-based SynthNet synthesizes the final interpolated frame. Experiments show that our approach enjoys higher accuracy and less inference time than the state-of-the-art. Source code is available at https://github.com/kinoud/DQBC.
#4168
ALL-E: Aesthetics-guided Low-light Image Enhancement
Evaluating the performance of low-light image enhancement (LLE) is highly subjective, thus making integrating human preferences into image enhancement a necessity. Existing methods fail to consider this and present a series of potentially valid heuristic criteria for training enhancement models. In this paper, we propose a new paradigm, i.e., aesthetics-guided low-light image enhancement (ALL-E), which introduces aesthetic preferences to LLE and motivates training in a reinforcement learning framework with an aesthetic reward. Each pixel, functioning as an agent, refines itself by recursive actions, i.e., its corresponding adjustment curve is estimated sequentially. Extensive experiments show that integrating aesthetic assessment improves both subjective experience and objective evaluation. Our results on various benchmarks demonstrate the superiority of ALL-E over state-of-the-art methods. Source code: https://dongl-group.github.io/project pages/ALLE.html
Wednesday 23rd August
15:30-16:50
CV: Applications
#2587
Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening
Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image. Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency. For one thing, the universally adopted black box principle limits the model interpretability. For another thing, existing DL-based methods fail to efficiently capture local and global dependencies at the same time, inevitably limiting the overall performance. To address these mentioned issues, we first formulate the degradation process of the high-resolution multispectral (HrMS) image as a unified variational optimization problem, and alternately solve its data and prior subproblems by the designed iterative proximal gradient descent (PGD) algorithm. Moreover, we customize a Local-Global Transformer (LGT) to simultaneously model local and global dependencies, and further formulate an LGT-based prior module for image denoising. Besides the prior module, we also design a lightweight data module. Finally, by serially integrating the data and prior modules in each iterative stage, we unfold the iterative algorithm into a stage-wise unfolding network, Local-Global Transformer Enhanced Unfolding Network (LGTEUN), for the interpretable MS pan-sharpening. Comprehensive experimental results on three satellite data sets demonstrate the effectiveness and efficiency of LGTEUN compared with state-of-the-art (SOTA) methods. The source code is available at https://github.com/lms-07/LGTEUN.
#1441
ViT-P3DE∗: Vision Transformer Based Multi-Camera Instance Association with Pseudo 3D Position Embeddings
Multi-camera instance association, which identifies identical objects among multiple objects in multi-view images, is challenging due to several harsh constraints. To tackle this problem, most studies have employed CNNs as feature extractors but often fail under such harsh constraints. Inspired by Vision Transformer (ViT), we first develop a pure ViT-based framework for robust feature extraction through self-attention and residual connection. We then propose two novel methods to achieve robust feature learning. First, we introduce learnable pseudo 3D position embeddings (P3DEs) that represent the 3D location of an object in the world coordinate system, which is independent of the harsh constraints. To generate P3DEs, we encode the camera ID and the object’s 2D position in the image using embedding tables. We then build a framework that trains P3DEs to represent an object’s 3D position in a weakly supervised manner. Second, we also utilize joint patch generation (JPG). During patch generation, JPG considers an object and its surroundings as a single input patch to reinforce the relationship information between two features. Ultimately, experimental results demonstrate that both ViT-P3DE and ViT-P3DE with JPG achieve state-of-the-art performance and significantly outperform existing works, especially when dealing with extremely harsh constraints.
#71
Teaching What You Should Teach: A Data-Based Distillation Method
In real teaching scenarios, an excellent teacher always teaches what he (or she) is good at but the student is not. This gives the student the best assistance in making up for his (or her) weaknesses and becoming a good one overall. Enlightened by this, we introduce the “Teaching what you Should Teach” strategy into a knowledge distillation framework, and propose a data-based distillation method named “TST” that searches for desirable augmented samples to assist in distilling more efficiently and rationally. To be specific, we design a neural network-based data augmentation module with priori bias to find out what meets the teacher’s strengths but the student’s weaknesses, by learning magnitudes and probabilities to generate suitable data samples. By training the data augmentation module and the generalized distillation paradigm alternately, a student model is learned with excellent generalization ability. To verify the effectiveness of our method, we conducted extensive comparative experiments on object recognition, detection, and segmentation tasks. The results on the CIFAR-100, ImageNet-1k, MS-COCO, and Cityscapes datasets demonstrate that our method achieves state-of-the-art performance on almost all teacher-student pairs. Furthermore, we conduct visualization studies to explore what magnitudes and probabilities are needed for the distillation process.
#3294
MMPN: Multi-supervised Mask Protection Network for Pansharpening
Pansharpening is to fuse a panchromatic (PAN) image with a multispectral (MS) image to obtain a high-spatial-resolution multispectral (HRMS) image. The deep learning-based pansharpening methods usually apply the convolution operation to extract features and only consider the similarity of gradient information between PAN and HRMS images, resulting in the problems of edge blur and spectral distortion in the fusion results. To solve this problem, a multi-supervised mask protection network (MMPN) is proposed to prevent spatial information from being damaged and overcome spectral distortion in the learning process. Firstly, by analyzing the relationships between high-resolution images and corresponding degraded images, a mask protection strategy (MPS) for edge protection is designed to guide the recovery of fused images. Then, based on the MPS, an MMPN containing four branches is constructed to generate the fusion and mask protection images. In MMPN, each branch employs a dual-stream multi-scale feature fusion module (DMFFM), which is built to extract and fuse the features of two input images. Finally, different loss terms are defined for the four branches, and combined into a joint loss function to realize network training. Experiments on simulated and real satellite datasets show that our method is superior to state-of-the-art methods both subjectively and objectively.
#4311
Hyperspectral Image Denoising Using Uncertainty-Aware Adjustor
Hyperspectral image (HSI) denoising has achieved promising results with the development of deep learning. A mainstream class of methods exploits the spatial-spectral correlations and recovers each band with the aids of neighboring bands, collectively referred to as spectral auxiliary networks. However, these methods treat entire adjacent spectral bands equally. In theory, clearer and nearer bands tend to contain more reliable spectral information than noisier and farther ones with higher uncertainties. How to achieve spectral enhancement and adaptation of each adjacent band has become an urgent problem in HSI denoising. This work presents the UA-Adjustor, a comprehensive adjustor that enhances denoising performance by considering both the band-to-pixel and enhancement-to-adjustment aspects. Specifically, UA-Adjustor consists of three stages that evaluate the importance of neighboring bands, enhance neighboring bands based on uncertainty perception, and adjust the weight of spatial pixels in adjacent bands through estimated uncertainty. For its simplicity, UA-Adjustor can be flexibly plugged into existing spectral auxiliary networks to improve denoising behavior at low cost. Extensive experimental results validate that the proposed solution can improve over recent state-of-the-art (SOTA) methods on both simulated and real-world benchmarks by a large margin.
#3562
Acoustic NLOS Imaging with Cross Modal Knowledge Distillation
Acoustic non-line-of-sight (NLOS) imaging aims to reconstruct hidden scenes by analyzing reflections of acoustic waves. Despite recent developments in the field, existing methods still have limitations such as sensitivity to noise in a physical model and difficulty in reconstructing unseen objects in a deep learning model. To address these limitations, we propose a novel cross-modal knowledge distillation (CMKD) approach for acoustic NLOS imaging. Our method transfers knowledge from a well-trained image network to an audio network, effectively combining the strengths of both modalities. As a result, it is robust to noise and superior in reconstructing unseen objects. Additionally, we evaluate real-world datasets and demonstrate that the proposed method outperforms state-of-the-art methods in acoustic NLOS imaging. The experimental results indicate that CMKD is an effective solution for addressing the limitations of current acoustic NLOS imaging methods. Our code, model, and data are available at https://github.com/shineh96/Acoustic-NLOS-CMKD.
#298
3D Surface Super-resolution from Enhanced 2D Normal Images: A Multimodal-driven Variational AutoEncoder Approach
3D surface super-resolution is an important technical tool in virtual reality, and it is also a research hotspot in computer vision. Due to the unstructured and irregular nature of 3D object data, it is usually difficult to obtain high-quality surface details and geometry textures via a low-cost hardware setup. In this paper, we establish a multimodal-driven variational autoencoder (mmVAE) framework to perform 3D surface enhancement based on 2D normal images. To fully leverage the multimodal learning, we investigate a multimodal Gaussian mixture model (mmGMM) to align and fuse the latent feature representations from different modalities, and further propose a cross-scale encoder-decoder structure to reconstruct high-resolution normal images. Experimental results on several benchmark datasets demonstrate that our method delivers promising surface geometry structures and details in comparison with competitive advances.
#3548
Analyzing and Combating Attribute Bias for Face Restoration
Face restoration (FR) recovers high resolution (HR) faces from low resolution (LR) faces and is challenging due to its ill-posed nature. With years of development, existing methods can produce quality HR faces with realistic details. However, we observe that key facial attributes (e.g., age and gender) of the restored faces could be dramatically different from the LR faces and call this phenomenon attribute bias, which is fatal when using FR for applications such as surveillance and security. Thus, we argue that FR should consider not only image quality as in existing works but also attribute bias. To this end, we thoroughly analyze attribute bias with extensive experiments and find that two major causes are the lack of attribute information in LR faces and bias in the training data. Moreover, we propose the DebiasFR framework to produce HR faces with high image quality and accurate facial attributes. The key design is to explicitly model the facial attributes, which also allows to adjust facial attributes for the output HR faces. Experiment results show that DebiasFR has comparable image quality but significantly smaller attribute bias when compared with state-of-the-art FR methods.
Wednesday 23rd August
15:30-16:50
Knowledge Representation and Reasoning (4/4)
#4163
Treewidth-Aware Complexity for Evaluating Epistemic Logic Programs
Logic programs are a popular formalism for encoding many problems relevant to knowledge representation and reasoning as well as artificial intelligence. However, for modeling rational behavior it is oftentimes required to represent the concepts of knowledge and possibility. Epistemic logic programs (ELPs) is such an extension that enables both concepts, which correspond to being true in all or some possible worlds or stable models. For these programs, the parameter treewidth has recently regained popularity. We present complexity results for the evaluation of key ELP fragments for treewidth, which are exponentially better than known results for full ELPs. Unfortunately, we prove that obtained runtimes can not be significantly improved, assuming the exponential time hypothesis. Our approach defines treewidth-aware reductions between quantified Boolean formulas and ELPs. We also establish
that the completion of a program, as used in modern solvers, can be turned treewidth-aware, thereby linearly preserving treewidth.
#880
A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram
Geometry problem solving (GPS) is a high-level mathematical reasoning requiring the capacities of multi-modal fusion and geometric knowledge application. Recently, neural solvers have shown great potential in GPS but still be short in diagram presentation and modal fusion. In this work, we convert diagrams into basic textual clauses to describe diagram features effectively, and propose a new neural solver called PGPSNet to fuse multi-modal information efficiently. Combining structural and semantic pre-training, data augmentation and self-limited decoding, PGPSNet is endowed with rich knowledge of geometry theorems and geometric representation, and therefore promotes geometric understanding and reasoning. In addition, to facilitate the research of GPS, we build a new large-scale and fine-annotated GPS dataset named PGPS9K, labeled with both fine-grained diagram annotation and interpretable solution program. Experiments on PGPS9K and an existing dataset Geometry3K validate the superiority of our method over the state-of-the-art neural solvers. Our code, dataset and appendix material are available at \url{https://github.com/mingliangzhang2018/PGPS}.
#4516
A Comparative Study of Ranking Formulas Based on Consistency
Ranking is ubiquitous in everyday life. This paper is concerned with the problem of ranking information of a knowledge base when this latter is possibly inconsistent. In particular, the key issue is to elicit a plausibility order on the formulas in an inconsistent knowledge base. We show how such ordering can be obtained by using only the inherent structure of the knowledge base. We start by introducing a principled way a reasonable ranking framework for formulas should satisfy. Then, a variety of ordering criteria have been explored to define plausibility order over formulas based on consistency. Finally, we study the behaviour of the different formula ranking semantics in terms of the proposed logical postulates as well as their (in)-compatibility.
#2577
An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations
Using reinforcement learning for automated theorem proving has recently received much attention. Current approaches use representations of logical statements that often rely on the names used in these statements and, as a result, the models are generally not transferable from one domain to another. The size of these representations and whether to include the whole theory or part of it are other important decisions that affect the performance of these approaches as well as their runtime efficiency. In this paper, we present NIAGRA; an ensemble Name InvAriant Graph RepresentAtion. NIAGRA addresses this problem by using 1) improved Graph Neural Networks for learning name-invariant formula representations that is tailored for their unique characteristics and 2) an efficient ensemble approach for automated theorem proving. Our experimental evaluation shows state-of-the-art performance on multiple datasets from different domains with improvements up to 10% compared to the best learning-based approaches. Furthermore, transfer learning experiments show that our approach significantly outperforms other learning-based approaches by up to 28%.
#3095
Learning Small Decision Trees with Large Domain
One favors decision trees (DTs) of the smallest size or depth to facilitate explainability and interpretability. However, learning such an optimal DT from data is well-known to be NP-hard. To overcome this complexity barrier, Ordyniak and Szeider (AAAI 21) initiated the study of optimal DT learning under the parameterized complexity perspective. They showed that solution size (i.e., number of nodes or depth of the DT) is insufficient to obtain fixed-parameter tractability (FPT). Therefore, they proposed an FPT algorithm that utilizes two auxiliary parameters: the maximum difference (as a structural property of the data set) and maximum domain size. They left it as an open question of whether bounding the maximum domain size is necessary.
The main result of this paper answers this question. We present FPT algorithms for learning a smallest or lowest-depth DT from data, with the only parameters solution size and maximum difference. Thus, our algorithm is significantly more potent than the one by Szeider and Ordyniak as it can handle problem inputs with features that range over unbounded domains. We also close several gaps concerning the quality of approximation one obtains by only considering DTs based on minimum support sets.
#2676
SAT-Based PAC Learning of Description Logic Concepts
We propose bounded fitting as a scheme for learning
description logic concepts in the presence of ontologies. A main
advantage is that the resulting learning algorithms come with
theoretical guarantees regarding their generalization to unseen
examples in the sense of PAC learning. We prove that, in contrast,
several other natural learning algorithms fail to provide such
guarantees. As a further contribution, we present the system SPELL
which efficiently implements bounded fitting for the description
logic ELHr based on a SAT solver, and compare its performance to a
state-of-the-art learner.
#SC10
MV-Datalog+/-: Effective Rule-based Reasoning with Uncertain Observations (Extended Abstract)
Modern data processing applications often combine information from a variety of complex sources. Oftentimes, some of these sources, like Machine-Learning systems or crowd-sourced data, are not strictly binary but associated with some degree of confidence in the observation. Ideally, reasoning over such data should take this additional information into account as much as possible. To this end, we propose extensions of Datalog and Datalog+/- to the semantics of Lukasiewicz logic Ł, one of the most common fuzzy logics. We show that such an extension preserves important properties from the classical case and how these properties can lead to efficient reasoning procedures for these new languages.
Wednesday 23rd August
15:30-16:50
Data Mining (3/3)
#4724
A Symbolic Approach to Computing Disjunctive Association Rules from Data
Association rule mining is one of the well-studied and most important knowledge discovery task in data mining. In this paper, we first introduce the k-disjunctive support based itemset, a generalization of the traditional model of itemset by allowing the absence of up to k items in each transaction matching the itemset. Then, to discover more expressive rules from data, we define the concept of (k, k′)-disjunctive support based association rules by considering the antecedent and the consequent of the rule as k-disjunctive and k′-disjunctive support based itemsets, respectively. Second, we provide a polynomial-time reduction of both the problems of mining k-disjunctive support based itemsets and (k, k′)-disjunctive support based association rules to the propositional satisfiability model enumeration task. Finally, we show through an extensive campaign of experiments on several popular real-life datasets the efficiency of our proposed approach
#4124
Online Harmonizing Gradient Descent for Imbalanced Data Streams One-Pass Classification
Many real-world streaming data are sequentially collected over time and with skew-distributed classes. In this situation, online learning models may tend to favor samples from majority classes, making the wrong decisions for those from minority classes. Previous methods try to balance the instance number of different classes or assign asymmetric cost values. They usually require data-buffers to store streaming data or pre-defined cost parameters. This study alternatively shows that the imbalance of instances can be implied by the imbalance of gradients. Then, we propose the Online Harmonizing Gradient Descent (OHGD) for one-pass online classification. By harmonizing the gradient magnitude occurred by different classes, the method avoids the bias of the proposed method in favor of the majority class. Specifically, OHGD requires no data-buffer, extra parameters, or prior knowledge. It also handles imbalanced data streams the same way that it would handle balanced data streams, which facilitates its easy implementation. On top of a few common and mild assumptions, the theoretical analysis proves that OHGD enjoys a satisfying sub-linear regret bound. Extensive experimental results demonstrate the high efficiency and effectiveness in handling imbalanced data streams.
#1540
KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot Node Classification
Recently, Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis. This task aims to predict nodes from unseen classes which are unobserved in the training process. Existing work mainly utilizes Graph Neural Networks (GNNs) to associate features’ prototypes and labels’ semantics thus enabling knowledge transfer from seen to unseen classes. However, the multi-faceted semantic orientation in the feature-semantic alignment has been neglected by previous work, i.e. the content of a node usually covers diverse topics that are relevant to the semantics of multiple labels. It’s necessary to separate and judge the semantic factors that tremendously affect the cognitive ability to improve the generality of models. To this end, we propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics via the extracted KG (Knowledge Graph)-based topics. And then the content of each node is reconstructed to a topic-level representation that offers multi-faceted and fine-grained semantic relevancy to different labels. Due to the particularity of the graph’s instance (i.e., node) representation, a novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation. Finally, we conduct extensive experiments on several public graph datasets and design an application of zero-shot cross-domain recommendation. The quantitative results demonstrate both the effectiveness and generalization of KMF with the comparison of state-of-the-art baselines.
#3395
OptIForest: Optimal Isolation Forest for Anomaly Detection
Anomaly detection plays an increasingly important role in various fields for critical tasks such as intrusion detection in cybersecurity, financial risk detection, and human health monitoring. A variety of anomaly detection methods have been proposed, and a category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency, e.g., iForest is often employed as a state-of-the-art detector for real deployment. While the majority of isolation forests use the binary structure, a framework LSHiForest has demonstrated that the multi-fork isolation tree structure can lead to better detection performance. However, there is no theoretical work answering the fundamentally and practically important question on the optimal tree structure for an isolation forest with respect to the branching factor. In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree. Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash which enables more information to be learned from data for better isolation quality. The rationale of our approach relies on a better bias-variance trade-off achieved by bias reduction in OptIForest. Extensive experiments on a series of benchmarking datasets for comparative and ablation studies demonstrate that our approach can efficiently and robustly achieve better detection performance in general than the state-of-the-arts including the deep learning based methods.
#SC7
Bounding the Family-Wise Error Rate in Local Causal Discovery Using Rademacher Averages (Extended Abstract)
Causal discovery from observational data provides candidate causal relationships that need to be validated with ad-hoc experiments. Such experiments usually require major resources, and suitable techniques should therefore be applied to identify candidate relations while limiting false positives.
Local causal discovery provides a detailed overview of the variables influencing a target, and it focuses on two sets of variables. The first one, the Parent-Children set, comprises all the elements that are direct causes of the target or that are its direct consequences, while the second one, called the Markov boundary, is the minimal set of variables for the optimal prediction of the target.
In this paper we present RAveL, the first suite of algorithms for local causal discovery providing rigorous guarantees on false discoveries. Our algorithms exploit Rademacher averages, a key concept in statistical learning theory, to account for the multiple-hypothesis testing problem in high-dimensional scenarios. Moreover, we prove that state-of-the-art approaches cannot be adapted for the task due to their strong and untestable assumptions, and we complement our analyses with extensive experiments, on synthetic and real-world data.
#3254
SAD: Semi-Supervised Anomaly Detection on Dynamic Graphs
Anomaly detection aims to distinguish abnormal instances that deviate significantly from the majority of benign ones. As instances that appear in the real world are naturally connected and can be represented with graphs, graph neural networks become increasingly popular in tackling the anomaly detection problem. Despite the promising results, research on anomaly detection has almost exclusively focused on static graphs while the mining of anomalous patterns from dynamic graphs is rarely studied but has significant application value. In addition, anomaly detection is typically tackled from semi-supervised perspectives due to the lack of sufficient labeled data. However, most proposed methods are limited to merely exploiting labeled data, leaving a large number of unlabeled samples unexplored. In this work, we present semi-supervised anomaly detection (SAD), an end-to-end framework for anomaly detection on dynamic graphs. By a combination of a time-equipped memory bank and a pseudo-label contrastive learning module, SAD is able to fully exploit the potential of large unlabeled samples and uncover underlying anomalies on evolving graph streams. Extensive experiments on four real-world datasets demonstrate that SAD efficiently discovers anomalies from dynamic graphs and outperforms existing advanced methods even when provided with only little labeled data.
#2754
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the distributed training of ultra-large models. However, directly deploying these systems often leads to sub-optimal training efficiency due to the complex model architectures and the strict device memory constraints. In this paper, we propose Optimal Sharded Data Parallel (OSDP), an automated parallel training system that combines the advantages from both data and model parallelism. Given the model description and the device information, OSDP makes trade-offs between the memory consumption and the hardware utilization, thus automatically generates the distributed computation graph and maximizes the overall system throughput. In addition, OSDP introduces operator splitting to further alleviate peak memory footprints during training with negligible overheads, which enables the trainability of larger models as well as the higher throughput. Extensive experimental results of OSDP on multiple different kinds of large-scale models demonstrate that the proposed strategy outperforms the state-of-the-art in multiple regards.
#SC16
Unsupervised Deep Subgraph Anomaly Detection (Extended Abstract)
Effectively mining anomalous subgraphs in networks is crucial for various applications, including disease outbreak detection, financial fraud detection, and activity monitoring in social networks. However, identifying anomalous subgraphs poses significant challenges due to their complex topological structures, high-dimensional attributes, multiple notions of anomalies, and the vast subgraph space within a given graph. Classical shallow models rely on handcrafted anomaly measure functions, limiting their applicability when prior knowledge is unavailable. Deep learning-based methods have shown promise in detecting node-level, edge-level, and graph-level anomalies, but subgraph-level anomaly detection remains under-explored due to difficulties in subgraph representation learning, supervision, and end-to-end anomaly quantification. To address these challenges, this paper introduces a novel deep framework named Anomalous Subgraph Autoencoder (AS-GAE). AS-GAE leverages an unsupervised and weakly supervised approach to extract anomalous subgraphs. It incorporates a location-aware graph autoencoder to uncover anomalous areas based on reconstruction mismatches and introduces a supermodular graph scoring function module to assign meaningful anomaly scores to subgraphs within the identified anomalous areas. Extensive experiments on synthetic and real-world datasets demonstrate the effectiveness of our proposed method.
Wednesday 23rd August
15:30-16:50
NLP: Information Extraction
#2841
ODEE: A One-Stage Object Detection Framework for Overlapping and Nested Event Extraction
The task of extracting overlapping and nested events has received significant attention in recent times, as prior research has primarily focused on extracting flat events, overlooking the intricacies of overlapping and nested occurrences. In this work, we present a new approach to Event Extraction (EE) by reformulating it as an object detection task on a table of token pairs. Our proposed one-stage event extractor, called ODEE, can handle overlapping and nested events. The model is designed with a vertex-based tagging scheme and two auxiliary tasks of predicting the spans and types of event trigger words and argument entities, leveraging the full span information of event elements. Furthermore, in the training stage, we introduce a negative sampling method for table cells to address the imbalance problem of positive and negative table cell tags, meanwhile improving computational efficiency. Empirical evaluations demonstrate that ODEE achieves the state-of-the-art performance on three benchmarks for overlapping and nested EE (i.e., FewFC, Genia11, and Genia13). Furthermore, ODEE outperforms current state-of-the-art methods in terms of both number of parameters and inference speed, indicating its high computational efficiency. To facilitate future research in this area, the codes are publicly available at https://github.com/NingJinzhong/ODEE.
#4194
Exploring Effective Inter-Encoder Semantic Interaction for Document-Level Relation Extraction
In document-level relation extraction (RE), the models are required to correctly predict implicit relations in documents via relational reasoning. To this end, many graph-based methods have been proposed for this task. Despite their success, these methods still suffer from several drawbacks: 1) their interaction between document encoder and graph encoder is usually unidirectional and insufficient; 2) their graph encoders often fail to capture the global context of nodes in document graph. In this paper, we propose a document-level RE model with a Graph-Transformer Network (GTN). The GTN includes two core sublayers: 1) the graph-attention sublayer that simultaneously models global and local contexts of nodes in the document graph; 2) the cross-attention sublayer, enabling GTN to capture the non-entity clue information from the document encoder. Furthermore, we introduce two auxiliary training tasks to enhance the bidirectional semantic interaction between the document encoder and GTN: 1) the graph node reconstruction that can effectively train our cross-attention sublayer to enhance the semantic transition from the document encoder to GTN; 2) the structure-aware adversarial knowledge distillation, by which we can effectively transfer the structural information of GTN to the document encoder. Experimental results on four benchmark datasets prove the effectiveness of our model. Our source code is available at https://github.com/DeepLearnXMU/DocRE-BSI.
#2490
Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
Transformers achieve promising performance in document understanding because of their high effectiveness and still suffer from quadratic computational complexity dependency on the sequence length. General efficient transformers are challenging to be directly adapted to model document. They are unable to handle the layout representation in documents, e.g. word, line and paragraph, on different granularity levels and seem hard to achieve a good trade-off between efficiency and performance. To tackle the concerns, we propose Fast-StrucTexT, an efficient multi-modal framework based on the StrucTexT algorithm with an hourglass transformer architecture, for visual document understanding. Specifically, we design a modality-guided dynamic token merging block to make the model learn multi-granularity representation and prunes redundant tokens. Additionally, we present a multi-modal interaction module called Symmetry Cross-Attention (SCA) to consider multi-modal fusion and efficiently guide the token mergence. The SCA allows one modality input as query to calculate cross attention with another modality in a dual phase. Extensive experiments on FUNSD, SROIE, and CORD datasets demonstrate that our model achieves the state-of-the-art performance and almost 1.9x faster inference time than the state-of-the-art methods.
#586
PasCore: A Chinese Overlapping Relation Extraction Model Based on Global Pointer Annotation Strategy
Recent work for extracting relations from texts has achieved excellent performance. However, existing studies mainly focus on simple relation extraction, these methods perform not well on overlapping triple problem because the tags of shared entities would conflict with each other. Especially, overlapping entities are common and indispensable in Chinese. To address this issue, this paper proposes PasCore, which utilizes a global pointer annotation strategy for overlapping relation extraction in Chinese. PasCore first obtains the sentence vector via general pre-training model encoder, and uses classifier to predicate relations. Subsequently, it uses global pointer annotation strategy for head entity annotation, which uses global tags to label the start and end positions of the entities. Finally, PasCore integrates the relation, head entity and its type to mark the tail entity. Furthermore, PasCore performs conditional layer normalization to fuse features, which connects all stages and greatly enriches the association between relations and entities. Experimental results on both Chinese and English real-world datasets demonstrate that PasCore outperforms strong baselines on relation extraction and, especially, shows superior performance on overlapping relation extraction.
#SC9
A Non-Factoid Question-Answering Taxonomy
Show Abstract
Hide Abstract
#2789
One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER
Cross-domain NER is a challenging task to address the low-resource problem in practical scenarios. Previous typical solutions mainly obtain a NER model by pre-trained language models (PLMs) with data from a rich-resource domain and adapt it to the target domain. Owing to the mismatch issue among entity types in different domains, previous approaches normally tune all parameters of PLMs, ending up with an entirely new NER model for each domain. Moreover, current models only focus on leveraging knowledge in one general source domain while failing to successfully transfer knowledge from multiple sources to the target. To address these issues, we introduce Collaborative Domain-Prefix Tuning for cross-domain NER (CP-NER) based on text-to-text generative PLMs. Specifically, we present text-to-text generation grounding domain-related instructors to transfer knowledge to new domain NER tasks without structural modifications. We utilize frozen PLMs and conduct collaborative domain-prefix tuning to stimulate the potential of PLMs to handle NER tasks across various domains. Experimental results on the Cross-NER benchmark show that the proposed approach has flexible transfer ability and performs better on both one-source and multiple-source cross-domain NER tasks.
#268
NerCo: A Contrastive Learning Based Two-Stage Chinese NER Method
Sequence labeling serves as the most commonly used scheme for Chinese named entity recognition(NER). However, traditional sequence labeling methods classify tokens within an entity into different classes according to their positions. As a result, different tokens in the same entity may be learned with representations that are isolated and unrelated in target representation space, which could finally negatively affect the subsequent performance of token classification. In this paper, we point out and define this problem as Entity Representation Segmentation in Label-semantics. And then we present NerCo: Named entity recognition with Contrastive learning, a novel NER framework which can better exploit labeled data and avoid the above problem. Following the pretrain-finetune paradigm, NerCo firstly guides the encoder to learn powerful label-semantics based representations by gathering the encoded token representations of the same Semantic Class while pushing apart that of different. Subsequently, NerCo finetunes the learned encoder for final entity prediction. Extensive experiments on several datasets demonstrate that our framework can consistently improve the baseline and achieve state-of-the-art performance.
Wednesday 23rd August
15:30-16:50
GTEP: Fair Division (2/2)
#2229
On Lower Bounds for Maximin Share Guarantees
We study the problem of fairly allocating a set of indivisible items to a set of agents with additive valuations. Recently, Feige et al. (WINE’21) proved that a maximin share (MMS) allocation exists for all instances with n agents and no more than n + 5 items. Moreover, they proved that an MMS allocation is not guaranteed to exist for instances with 3 agents and at least 9 items, or n ≥ 4 agents and at least 3n + 3 items. In this work, we shrink the gap between these upper and lower bounds for guaranteed existence of MMS allocations. We prove that for any integer c > 0, there exists a number of agents n_c such that an MMS allocation exists for any instance with n ≥ n_c agents and at most n + c items, where n_c ≤ ⌊0.6597^c · c!⌋ for allocation of goods and n_c ≤ ⌊0.7838^c · c!⌋ for chores. Furthermore, we show that for n ≠ 3 agents, all instances with n + 6 goods have an MMS allocation.
#J5941
Ordinal Maximin Share Approximation for Goods (Extended Abstract)
In fair division of indivisible goods, l-out-of-d maximin share (MMS) is the value that an agent can guarantee by partitioning the goods into d bundles and choosing the l least preferred bundles. Most existing works aim to guarantee to all agents a constant fraction of their 1-out-of-n MMS. But this guarantee is sensitive to small perturbation in agents’ cardinal valuations. We consider a more robust approximation notion, which depends only on the agents’ ordinal rankings of bundles. We prove the existence of l-out-of-floor((l+1/2)n) MMS allocations of goods for any integer l greater than or equal to 1, and present a polynomial-time algorithm that finds a 1-out-of-ceiling(3n/2) MMS allocation when l = 1. We further develop an algorithm that provides a weaker ordinal approximation to MMS for any l > 1.
#1003
New Fairness Concepts for Allocating Indivisible Items
For the fundamental problem of fairly dividing a set of indivisible items among agents, envy-freeness up to any item (EFX) and maximin fairness (MMS) are arguably the most compelling fairness concepts proposed till now. Unfortunately, despite significant efforts over the past few years, whether EFX allocations always exist is still an enigmatic open problem, let alone their efficient computation. Furthermore, today we know that MMS allocations are not always guaranteed to exist. These facts weaken the usefulness of both EFX and MMS, albeit their appealing conceptual characteristics.
We propose two alternative fairness concepts—called epistemic EFX (EEFX) and minimum EFX value fairness (MXS)—inspired by EFX and MMS. For both, we explore their relationships to well-studied fairness notions and, more importantly, prove that EEFX and MXS allocations always exist and can be computed efficiently for additive valuations. Our results justify that the new fairness concepts are excellent alternatives to EFX and MMS.
#3073
Simplification and Improvement of MMS Approximation
We consider the problem of fairly allocating a set of indivisible goods among n agents with additive valuations, using the popular fairness notion of maximin share (MMS). Since MMS allocations do not always exist, a series of works provided existence and algorithms for approximate MMS allocations. The Garg-Taki algorithm gives the current best approximation factor of (3/4 + 1/12n). Most of these results are based on complicated analyses, especially those providing better than 2/3 factor. Moreover, since no tight example is known of the Garg-Taki algorithm, it is unclear if this is the best factor of this approach. In this paper, we significantly simplify the analysis of this algorithm and also improve the existence guarantee to a factor of (3/4 + min(1/36, 3/(16n-4))). For small n, this provides a noticeable improvement. Furthermore, we present a tight example of this algorithm, showing that this may be the best factor one can hope for with the current techniques.
#3423
New Algorithms for the Fair and Efficient Allocation of Indivisible Chores
We study the problem of fairly and efficiently allocating indivisible chores among agents with additive disutility functions. We consider the widely used envy-based fairness properties of EF1 and EFX in conjunction with the efficiency property of fractional Pareto-optimality (fPO). Existence (and computation) of an allocation that is simultaneously EF1/EFX and fPO are challenging open problems, and we make progress on both of them. We show the existence of an allocation that is
– EF1 + fPO, when there are three agents,
– EF1 + fPO, when there are at most two disutility functions,
– EFX + fPO, for three agents with bivalued disutility functions.
These results are constructive, based on strongly polynomial-time algorithms. We also investigate non-existence and show that an allocation that is EFX+fPO need not exist, even for two agents.
#3115
Fairly Allocating Goods and (Terrible) Chores
We study the fair allocation of mixture of indivisible goods and chores under lexicographic preferences—a subdomain of additive preferences. A prominent fairness notion for allocating indivisible items is envy-freeness up to any item (EFX). Yet, its existence and computation has remained a notable open problem. By identifying a class of instances with “terrible chores”, we show that determining the existence of an EFX allocation is NP-complete. This result immediately implies the intractability of EFX under additive preferences. Nonetheless, we propose a natural subclass of lexicographic preferences for which an EFX and Pareto optimal (PO) allocation is guaranteed to exist and can be computed efficiently for any mixed instance. Focusing on two weaker fairness notions, we investigate finding EF1 and Pareto optimal allocations for special instances with terrible chores, and show that MMS and PO allocations can be computed efficiently for any mixed instance with lexicographic preferences.
#4454
Fair and Efficient Allocation of Indivisible Chores with Surplus
We study fair division of indivisible chores among n agents with additive disutility functions. Two well-studied fairness notions for indivisible items are envy-freeness up to one/any item (EF1/EFX) and the standard notion of economic efficiency is Pareto optimality (PO). There is a noticeable gap between the results known for both EF1 and EFX in the goods and chores settings. The case of chores turns out to be much more challenging. We reduce this gap by providing slightly relaxed versions of the known results on goods for the chores setting. Interestingly, our algorithms run in polynomial time, unlike their analogous versions in the goods setting.
We introduce the concept of k surplus in the chores setting which means that up to k more chores are allocated to the agents and each of them is a copy of an original chore. We present a polynomial-time algorithm which gives EF1 and PO allocations with n-1 surplus.
We relax the notion of EFX slightly and define tEFX which requires that the envy from agent i to agent j is removed upon the transfer of any chore from the i’s bundle to j’s bundle. We give a polynomial-time algorithm that in the chores case for 3 agents returns an allocation which is either proportional or tEFX. Note that proportionality is a very strong criterion in the case of indivisible items, and hence both notions we guarantee are desirable.
#4004
Maximin-Aware Allocations of Indivisible Chores with Symmetric and Asymmetric Agents
The real-world deployment of fair allocation algorithms usually involves a heterogeneous population of users, which makes it challenging for the users to get complete knowledge of the allocation except for their own bundles. Chan et al. [IJCAI 2019] proposed a new fairness notion, maximin-awareness (MMA), which guarantees that every agent is not the worst-off one, no matter how the items that are not allocated to her are distributed. We adapt and generalize this notion to the case of indivisible chores and when the agents may have arbitrary weights. Due to the inherent difficulty of MMA, we also consider its up to one and up to any relaxations. A string of results on the existence and computation of MMA related fair allocations, and their connections to existing fairness concepts are given.
Wednesday 23rd August
15:30-16:50
Humans and AI
#3140
Can You Improve My Code? Optimizing Programs with Local Search
This paper introduces a local search method for improving an existing program with respect to a measurable objective. Program Optimization with Locally Improving Search (POLIS) exploits the structure of a program, defined by its lines. POLIS improves a single line of the program while keeping the remaining lines fixed, using existing brute-force synthesis algorithms, and continues iterating until it is unable to improve the program’s performance. POLIS was evaluated with a 27-person user study, where participants wrote programs attempting to maximize the score of two single-agent games: Lunar Lander and Highway. POLIS was able to substantially improve the participants’ programs with respect to the game scores. A proof-of-concept demonstration on existing Stack Overflow code measures applicability in real-world problems.
These results suggest that POLIS could be used as a helpful programming assistant for programming problems with measurable objectives.
#3378
The Effects of AI Biases and Explanations on Human Decision Fairness: A Case Study of Bidding in Rental Housing Markets
The use of AI-based decision aids in diverse domains has inspired many empirical investigations into how AI models’ decision recommendations impact humans’ decision accuracy in AI-assisted decision making, while explorations on the impacts on humans’ decision fairness are largely lacking despite their clear importance. In this paper, using a real-world business decision making scenario—bidding in rental housing markets—as our testbed, we present an experimental study on understanding how the bias level of the AI-based decision aid as well as the provision of AI explanations affect the fairness level of humans’ decisions, both during and after their usage of the decision aid. Our results suggest that when people are assisted by an AI-based decision aid, both the higher level of racial biases the decision aid exhibits and surprisingly, the presence of AI explanations, result in more unfair human decisions across racial groups. Moreover, these impacts are partly made through triggering humans’ “disparate interactions” with AI. However, regardless of the AI bias level and the presence of AI explanations, when people return to make independent decisions after their usage of the AI-based decision aid, their decisions no longer exhibit significant unfairness across racial groups.
#4991
A Hierarchical Approach to Population Training for Human-AI Collaboration
A major challenge for deep reinforcement learning (DRL) agents is to collaborate with novel partners that were not encountered by them during the training phase. This is specifically worsened by an increased variance in action responses when the DRL agents collaborate with human partners due to the lack of consistency in human behaviors. Recent work have shown that training a single agent as the best response to a diverse population of training partners significantly increases an agent’s robustness to novel partners. We further enhance the population-based training approach by introducing a Hierarchical Reinforcement Learning (HRL) based method for Human-AI Collaboration. Our agent is able to learn multiple best-response policies as its low-level policy while at the same time, it learns a high-level policy that acts as a manager which allows the agent to dynamically switch between the low-level best-response policies based on its current partner. We demonstrate that our method is able to dynamically adapt to novel partners of different play styles and skill levels in the 2-player collaborative Overcooked game environment. We also conducted a human study in the same environment to test the effectiveness of our method when partnering with real human subjects. Code is available at https://gitlab.com/marvl-hipt/hipt.
#1920
Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue
Collaborative tasks often begin with partial task knowledge and incomplete plans from each partner.
To complete these tasks, partners need to engage in situated communication with their partners and coordinate their partial plans towards a complete plan to achieve a joint task goal.
While such collaboration seems effortless in a human-human team, it is highly challenging for human-AI collaboration.
To address this limitation, this paper takes a step towards Collaborative Plan Acquisition, where humans and agents strive to learn and communicate with each other to acquire a complete plan for joint tasks.
Specifically, we formulate a novel problem for agents to predict the missing task knowledge for themselves and for their partners based on rich perceptual and dialogue history.
We extend a situated dialogue benchmark for symmetric collaborative tasks in a 3D blocks world and investigate computational strategies for plan acquisition.
Our empirical results suggest that predicting the partner’s missing knowledge is a more viable approach than predicting one’s own.
We show that explicit modeling of the partner’s dialogue moves and mental states produces improved and more stable results than without.
These results provide insight for future AI agents that
can predict what knowledge their partner is missing and, therefore, can proactively communicate such information to help the partner acquire such missing knowledge toward a common understanding of joint tasks.
#555
Strategic Adversarial Attacks in AI-assisted Decision Making to Reduce Human Trust and Reliance
With the increased integration of AI technologies in human decision making processes, adversarial attacks on AI models become a greater concern than ever before as they may significantly hurt humans’ trust in AI models and decrease the effectiveness of human-AI collaboration. While many adversarial attack methods have been proposed to decrease the performance of an AI model, limited attention has been paid on understanding how these attacks will impact the human decision makers interacting with the model, and accordingly, how to strategically deploy adversarial attacks to maximize the reduction of human trust and reliance. In this paper, through a human-subject experiment, we first show that in AI-assisted decision making, the timing of the attacks largely influences how much humans decrease their trust in and reliance on AI—the decrease is particularly salient when attacks occur on decision making tasks that humans are highly confident themselves. Based on these insights, we next propose an algorithmic framework to infer the human decision maker’s hidden trust in the AI model and dynamically decide when the attacker should launch an attack to the model. Our evaluations show that following the proposed approach, attackers deploy more efficient attacks and achieve higher utility than adopting other baseline strategies.
#1685
Learning Heuristically-Selected and Neurally-Guided Feature for Age Group Recognition Using Unconstrained Smartphone Interaction
Owing to the boom of smartphone industries, the expansion of phone users has also been significant. Besides adults, children and elders have also begun to join the population of daily smartphone users. Such an expansion indeed facilitates the further exploration of the versatility and flexibility of digitization. However, these new users may also be susceptible to issues such as addiction, fraud, and insufficient accessibility. To fully utilize the capability of mobile devices without breaching personal privacy, we build the first corpus for age group recognition on smartphones with more than 1,445,087 unrestricted actions from 2,100 subjects. Then a series of heuristically-selected and neurally-guided features are proposed to increase the separability of the above dataset. Finally, we develop AgeCare, the first implicit and continuous system incorporated with bottom-to-top functionality without any restriction on user-phone interaction scenarios, for accurate age group recognition and age-tailored assistance on smartphones. Our system performs impressively well on this dataset and significantly surpasses the state-of-the-art methods.
#326
TDG4Crowd:Test Data Generation for Evaluation of Aggregation Algorithms in Crowdsourcing
In crowdsourcing, existing efforts mainly use real datasets collected from crowdsourcing as test datasets to evaluate the effectiveness of aggregation algorithms. However, these work ignore the fact that the datasets obtained by crowdsourcing are usually sparse and imbalanced due to limited budget. As a result, applying the same aggregation algorithm on different datasets often show contradicting conclusions. For example, on the RTE dataset, Dawid and Skene model performs significantly better than Majority Voting, while on the LableMe dataset, the experiments give the opposite conclusion. It is challenging to obtain comprehensive and balanced datasets at a low cost. To our best knowledge, little effort have been made to the fair evaluation of aggregation algorithms. To fill in this gap, we propose a novel method named TDG4Crowd that can automatically generate comprehensive and balanced datasets. Using Kullback Leibler divergence and Kolmogorov–Smirnov test, the experiment results show the superior of our method compared with others. Aggregation algorithms also perform more consistently on the synthetic datasets generated using our method.
Wednesday 23rd August
15:30-16:50
AI Ethics, Trust, Fairness (2/3)
#193
SF-PATE: Scalable, Fair, and Private Aggregation of Teacher Ensembles
A critical concern in data-driven processes is to build models whose outcomes do not discriminate against some protected groups. In learning tasks, knowledge of the group attributes is essential to ensure non-discrimination, but in practice, these attributes may not be available due to legal and ethical requirements. To address this challenge, this paper studies a model that protects the privacy of individuals’ sensitive information while also allowing it to learn non-discriminatory predictors.
A key feature of the proposed model is to enable the use of off-the-shelves and non-private fair models to create a privacy-preserving and fair model. The paper analyzes the relation between accuracy, privacy, and fairness, and assesses the benefits of the proposed models on several prediction tasks. In particular, this proposal allows both scalable and accurate training of private and fair models for very large neural networks.
#1310
Fairness via Group Contribution Matching
Fairness issues in Deep Learning models have recently received increasing attention due to their significant societal impact. Although methods for mitigating unfairness are constantly proposed, little research has been conducted to understand how discrimination and bias develop during the standard training process. In this study, we propose analyzing the contribution of each subgroup (i.e., a group of data with the same sensitive attribute) in the training process to understand the cause of such bias development process. We propose a gradient-based metric to assess training subgroup contribution disparity, showing that unequal contributions from different subgroups are one source of such unfairness. One way to balance the contribution of each subgroup is through oversampling, which ensures that an equal number of samples are drawn from each subgroup during each training iteration. However, we have found that even with a balanced number of samples, the contribution of each group remains unequal, resulting in unfairness under the oversampling strategy. To address the above issues, we propose an easy but effective group contribution matching (GCM) method to match the contribution of each subgroup. Our experiments show that our GCM effectively improves fairness and outperforms other methods significantly.
#1268
Towards Semantics- and Domain-Aware Adversarial Attacks
Language models are known to be vulnerable to textual adversarial attacks, which add human-imperceptible perturbations to the input to mislead DNNs. It is thus imperative to devise effective attack algorithms to identify the deficiencies of DNNs before real-world deployment. However, existing word-level attacks have two major deficiencies: (1) They may change the semantics of the original sentence. (2) The generated adversarial sample can appear unnatural to humans due to the introduction of out-of-domain substitute words. In this paper, to address such drawbacks, we propose a semantics- and domain-aware word-level attack method. Specifically, we greedily replace the important words in a sentence with the ones suggested by a language model. The language model is trained to be semantics- and domain-aware via contrastive learning and in-domain pre-training. Furthermore, to balance the quality of adversarial examples and the attack success rate, we propose an iterative updating framework to optimize the contrastive learning loss and the in-domain pre-training loss in circular order. Comprehensive experimental comparisons confirm the superiority of our approach. Notably, compared with state-of-the-art benchmarks, our strategy can achieve over 3\% improvement in attack success rates and 9.8\% improvement in the quality of adversarial examples.
#4734
Explanation-Guided Reward Alignment
Agents often need to infer a reward function from observations to learn desired behaviors. However, agents may infer a reward function that does not align with the original intent because there can be multiple reward functions consistent with its observations. Operating based on such misaligned rewards can be risky. Furthermore, black-box representations make it difficult to verify the learned rewards and prevent harmful behavior. We present a framework for verifying and improving reward alignment using explanations and show how explanations can help detect misalignment and reveal failure cases in novel scenarios. The problem is formulated as inverse reinforcement learning from ranked trajectories. Verification tests created from the trajectory dataset are used to iteratively validate and improve reward alignment. The agent explains its learned reward and a tester signals whether the explanation passes the test. In cases where the explanation fails, the agent offers alternative explanations to gather feedback, which is then used to improve the learned reward. We analyze the efficiency of our approach in improving reward alignment using different types of explanations and demonstrate its effectiveness in five domains.
#SV5587
Assessing and Enforcing Fairness in the AI Lifecycle
A significant challenge in detecting and mitigating bias is creating a mindset amongst AI developers to address unfairness. The current literature on fairness is broad, and the learning curve to distinguish where to use existing metrics and techniques for bias detection or mitigation is difficult. This survey systematises the state-of-the-art about distinct notions of fairness and relative techniques for bias mitigation according to the AI lifecycle. Gaps and challenges identified during the development of this work are also discussed.
#3078
Negative Flux Aggregation to Estimate Feature Attributions
There are increasing demands for understanding deep neural networks’ (DNNs) behavior spurred by growing security and/or transparency concerns. Due to multi-layer nonlinearity of the deep neural network architectures, explaining DNN predictions still remains as an open problem, preventing us from gaining a deeper understanding of the mechanisms. To enhance the explainability of DNNs, we estimate the input feature’s attributions to the prediction task using divergence and flux. Inspired by the divergence theorem in vector analysis, we develop a novel Negative Flux Aggregation (NeFLAG) formulation and an efficient approximation algorithm to estimate attribution map. Unlike the previous techniques, ours doesn’t rely on fitting a surrogate model nor need any path integration of gradients. Both qualitative and quantitative experiments demonstrate a superior performance of NeFLAG in generating more faithful attribution maps than the competing methods. Our code is available at https://github.com/xinli0928/NeFLAG.
#2099
Robust Reinforcement Learning via Progressive Task Sequence
Robust reinforcement learning (RL) has been a challenging problem due to the gap between simulation and the real world. Existing efforts typically address the robust RL problem by solving a max-min problem. The main idea is to maximize the cumulative reward under the worst-possible perturbations. However, the worst-case optimization either leads to overly conservative solutions or unstable training process, which further affects the policy robustness and generalization performance. In this paper, we tackle this problem from both formulation definition and algorithm design. First, we formulate the robust RL as a max-expectation optimization problem, where the goal is to find an optimal policy under both the worst cases and the non-worst cases. Then, we propose a novel framework DRRL to solve the max-expectation optimization. Given our definition of the feasible tasks, a task generation and sequencing mechanism is introduced to dynamically output tasks at appropriate difficulty level for the current policy. With these progressive tasks, DRRL realizes dynamic multi-task learning to improve the policy robustness and the training stability. Finally, extensive experiments demonstrate that the proposed method exhibits significant performance on the unmanned CarRacing game and multiple high-dimensional MuJoCo environments.
Wednesday 23rd August
15:30-16:50
Planning and Scheduling (2/3)
#3449
K∗ Search over Orbit Space for Top-k Planning
Top-k planning, the task of finding k top-cost plans, is a key formalism for many planning applications and K* search is a well-established approach to top-k planning. The algorithm iteratively runs A* search and Eppstein’s algorithm until a sufficient number of plans is found. The performance of K* algorithm is therefore inherently limited by the performance of A*, and in order to improve K* performance, that of A* must be improved. In cost-optimal planning, orbit space search improves A* performance by exploiting symmetry pruning, essentially performing A* in the orbit space instead of state space. In this work, we take a similar approach to top-k planning. We show theoretical equivalence between the goal paths in the state space and in the orbit space, allowing to perform K* search in the orbit space instead, reconstructing plans from the found paths in the orbit space. We prove that our algorithm is sound and complete for top-k planning and empirically show it to achieve state-of-the-art performance, overtaking all existing to date top-k planners. The code is available at https://github.com/IBM/kstar.
#5292
Optimal Decision Tree Policies for Markov Decision Processes
Interpretability of reinforcement learning policies is essential for many real-world tasks but learning such interpretable policies is a hard problem. Particularly, rule-based policies such as decision trees and rules lists are difficult to optimize due to their non-differentiability. While existing techniques can learn verifiable decision tree policies, there is no guarantee that the learners generate a policy that performs optimally. In this work, we study the optimization of size-limited decision trees for Markov Decision Processes (MPDs) and propose OMDTs: Optimal MDP Decision Trees. Given a user-defined size limit and MDP formulation, OMDT directly maximizes the expected discounted return for the decision tree using Mixed-Integer Linear Programming. By training optimal tree policies for different MDPs we empirically study the optimality gap for existing imitation learning techniques and find that they perform sub-optimally. We show that this is due to an inherent shortcoming of imitation learning, namely that complex policies cannot be represented using size-limited trees. In such cases, it is better to directly optimize the tree for expected return. While there is generally a trade-off between the performance and interpretability of machine learning models, we find that on small MDPs, depth 3 OMDTs often perform close to optimally.
#J5926
Simplified Risk-aware Decision Making with Belief-dependent Rewards in Partially Observable Domains (Extended Abstract)
It is a long-standing objective to ease the computation burden incurred by the decision-making problem under partial observability. Identifying the sensitivity to simplification of various components of the original problem has tremendous ramifications. Yet, algorithms for decision-making under uncertainty usually lean on approximations or heuristics without quantifying their effect. Therefore, challenging scenarios could severely impair the performance of such methods. In this paper, we extend the decision-making mechanism to the whole by removing standard approximations and considering all previously suppressed stochastic sources of variability. On top of this extension, we scrutinize the distribution of the return. We begin from a return given a single candidate policy and continue to the pair of returns given a corresponding pair of candidate policies. Furthermore, we present novel stochastic bounds on the return and novel tools, Probabilistic Loss (PLoss) and its online accessible counterpart (PbLoss), to characterize the effect of a simplification.
#3422
Simulation-Assisted Optimization for Large-Scale Evacuation Planning with Congestion-Dependent Delays
Evacuation planning is a crucial part of disaster management. However, joint optimization of its two essential components, routing and scheduling, with objectives such as minimizing average evacuation time or evacuation completion time, is a computationally hard problem. To approach it, we present MIP-LNS, a scalable optimization method that utilizes heuristic search with mathematical optimization and can optimize a variety of objective functions. We also present the method MIP-LNS-SIM, where we combine agent-based simulation with MIP-LNS to estimate delays due to congestion, as well as, find optimized plans considering such delays. We use Harris County in Houston, Texas, as our study area. We show that, within a given time limit, MIP-LNS finds better solutions than existing methods in terms of three different metrics. However, when congestion dependent delay is considered, MIP-LNS-SIM outperforms MIP-LNS in multiple performance metrics. In addition, MIP-LNS-SIM has a significantly lower percent error in estimated evacuation completion time compared to MIP-LNS.
#J5930
Motion Planning Under Uncertainty with Complex Agents and Environments via Hybrid Search (Extended Abstract)
As autonomous systems tackle more real-world situations, mission success oftentimes cannot be guaranteed and the planner must reason about the probability of failure. Unfortunately, computing a trajectory that satisfies mission goals while constraining the probability of failure is difficult because of the need to reason about complex, multidimensional probability distributions. Recent methods have seen success using chance-constrained, model-based planning. We argue there are two main drawbacks to these approaches. First, current methods suffer from an inability to deal with expressive environment models such as 3D non-convex obstacles. Second, most planners rely on considerable simplifications when computing trajectory risk including approximating the agent’s dynamics, geometry, and uncertainty. We apply hybrid search to the risk-bound, goal-directed planning problem. The hybrid search consists of a region planner and a trajectory planner. The region planner makes discrete choices by reasoning about geometric regions that the agent should visit in order to accomplish its mission. In formulating the region planner, we propose landmark regions that help produce obstacle-free paths. The region planner passes paths through the environment to a trajectory planner; the task of the trajectory planner is to optimize trajectories that respect the agent’s dynamics and the user’s desired risk of mission failure. We discuss three approaches to modeling trajectory risk: a CDF-based approach, a sampling-based collocation method, and an algorithm named Shooting Method Monte Carlo. A variety of 2D and 3D test cases are presented in the full paper including a linear case, a Dubins car model, and an underwater autonomous vehicle. The method is shown to outperform other methods in terms of speed and utility of the solution. Additionally, the models of trajectory risk are shown to better approximate risk in simulation.
#J5950
Gradient-Based Mixed Planning with Symbolic and Numeric Action Parameters (Extended Abstract)
Dealing with planning problems with both logical relations and numeric changes in real-world dynamic environments is challenging. Existing numeric planning systems for the problem often discretize numeric variables or impose convex constraints on numeric variables, which harms the performance when solving problems, especially when the problems contain obstacles and non-linear numeric effects. In this work, we propose a novel algorithm framework to solve numeric planning problems mixed with logical relations and numeric changes based on gradient descent. We cast the numeric planning with logical relations and numeric changes as an optimization problem. Specifically, we extend the syntax to allow parameters of action models to be either objects or real-valued numbers, which enhances the ability to model real-world numeric effects. Based on the extended modeling language, we propose a gradient-based framework to simultaneously optimize numeric parameters and compute appropriate actions to form candidate plans. The gradient-based framework is composed of an algorithmic heuristic module based on propositional operations to select actions and generate constraints for gradient descent, an algorithmic transition module to update states to the next ones, and a loss module to compute loss. We repeatedly minimize loss by updating numeric parameters and compute candidate plans until it converges into a valid plan for the planning problem.
#J5924
A Logic-based Explanation Generation Framework for Classical and Hybrid Planning Problems (Extended Abstract)
In human-aware planning systems, a planning agent might need to explain its plan to a human user when that plan appears to be non-feasible or sub-optimal. A popular approach, called model reconciliation, has been proposed as a way to bring the model of the human user closer to the agent’s model. In this paper, we approach the model reconciliation problem from a different perspective, that of knowledge representation and reasoning, and demonstrate that our approach can be applied not only to classical planning problems but also hybrid systems planning problems with durative actions and events/processes.
#3398
Action Space Reduction for Planning Domains
Planning tasks succinctly represent labeled transition systems, with each ground action corresponding to a label. This granularity, however, is not necessary for solving planning tasks and can be harmful, especially for model-free methods. In order to apply such methods, the label sets are often manually reduced. In this work, we propose automating this manual process. We characterize a valid label reduction for classical planning tasks and propose an automated way of obtaining such valid reductions by leveraging lifted mutex groups. Our experiments show a significant reduction in the action label space size across a wide collection of planning domains. We demonstrate the benefit of our automated label reduction in two separate use cases: improved sample complexity of model-free reinforcement learning algorithms and speeding up successor generation in lifted planning. The code and supplementary material are available at https://github.com/IBM/Parameter-Seed-Set.
Wednesday 23rd August
15:30-16:50
AI for Social Good – ML (2/2)
#AI4SG1368
AudioQR: Deep Neural Audio Watermarks For QR Code
Image-based quick response (QR) code is frequently used, but creates barriers for the visual impaired people. With the goal of “AI for good”, this paper proposes the AudioQR, a barrier-free QR coding mechanism for the visually impaired population via deep neural audio watermarks. Previous audio watermarking approaches are mainly based on handcrafted pipelines, which is less secure and difficult to apply in large-scale scenarios. In contrast, AudioQR is the first comprehensive end-to-end pipeline that hides watermarks in audio imperceptibly and robustly. To achieve this, we jointly train an encoder and decoder, where the encoder is structured as a concatenation of transposed convolutions and multi-receptive field fusion modules. Moreover, we customize the decoder training with a stochastic data augmentation chain to make the watermarked audio robust towards different audio distortions, such as environment background, room impulse response when playing through the air, music surrounding, and Gaussian noise. Experiment results indicate that AudioQR can efficiently hide arbitrary information into audio without introducing significant perceptible difference. Our code is available at https://github.com/xinghua-qu/AudioQR.
#AI4SG3664
Interpret ESG Rating’s Impact on the Industrial Chain Using Graph Neural Networks
We conduct a quantitative analysis of the development of the industry chain from the environmental, social, and governance (ESG) perspective, which is an overall measure of sustainability. Factors that may impact the performance of the industrial chain have been studied in the literature, such as government regulation, monetary policy, etc. Our interest lies in how the sustainability change (i.e., ESG shock) affects the performance of the industrial chain. To achieve this goal, we model the industrial chain with a graph neural network (GNN) and conduct node regression on two financial performance metrics, namely, the aggregated profitability ratios and operating margin. To quantify the effects of ESG, we propose to compute the interaction between ESG shocks and industrial chain features with a cross-attention module, and then filter the original node features in the graph regression. Experiments on two real datasets demonstrate that (i) there are significant effects of ESG shocks on the industrial chain, and (ii) model parameters including regression coefficients and the attention map can explain how ESG shocks affect the performance of the industrial chain.
#AI4SG5259
Unified Model for Crystalline Material Generation
One of the greatest challenges facing our society is the discovery of new innovative crystal materials with specific properties. Recently, the problem of generating crystal materials has received increasing attention, however, it remains unclear to what extent, or in what way, we can develop generative models that consider both the periodicity and equivalence geometric of crystal structures. To alleviate this issue, we propose two unified models that act at the same time on crystal lattice and atomic positions using periodic equivariant architectures. Our models are capable to learn any arbitrary crystal lattice deformation by lowering the total energy to reach thermodynamic stability. Code and data are available at https://github.com/aklipf/GemsNet.
#AI4SG5682
Forecasting Soil Moisture Using Domain Inspired Temporal Graph Convolution Neural Networks To Guide Sustainable Crop Management
Agriculture faces unprecedented challenges due to climate change, population growth, and water scarcity. These challenges highlight the need for efficient resource usage to optimize crop production. Conventional techniques for forecasting hydrological response features, such as soil moisture, rely on physics-based and empirical hydrological models, which necessitate significant time and domain expertise. Drawing inspiration from traditional hydrological modeling, a novel temporal graph convolution neural network has been constructed. This involves grouping units based on their time-varying hydrological properties, constructing graph topologies for each cluster based on similarity using dynamic time warping, and utilizing graph convolutions and a gated recurrent neural network to forecast soil moisture. The method has been trained, validated, and tested on field-scale time series data spanning 40 years in northeastern United States. Results show that using domain-inspired clustering with time series graph neural networks is more effective in forecasting soil moisture than existing models. This framework is being deployed as part of a pro bono social impact program that leverages hybrid cloud and AI technologies to enhance and scale non-profit and government organizations. The trained models are currently being deployed on a series of small-holding farms in central Texas.
#AI4SG5757
Disentangling Societal Inequality from Model Biases: Gender Inequality in Divorce Court Proceedings
Divorce is the legal dissolution of a marriage by a court. Since this is usually an unpleasant outcome of a marital union, each party may have reasons to call the decision to quit which is generally documented in detail in the court proceedings. Via a substantial corpus of 17,306 court proceedings, this paper investigates gender inequality through the lens of divorce court proceedings. To our knowledge, this is the first-ever large-scale computational analysis of gender inequality in Indian divorce, a taboo-topic for ages. While emerging data sources (e.g., public court records made available on the web) on sensitive societal issues hold promise in aiding social science research, biases present in cutting-edge natural language processing (NLP) methods may interfere with or affect such studies. A thorough analysis of potential gaps and limitations present in extant NLP resources is thus of paramount importance. In this paper, on the methodological side, we demonstrate that existing NLP resources required several non-trivial modifications to quantify societal inequalities. On the substantive side, we find that while a large number of court cases perhaps suggest changing norms in India where women are increasingly challenging patriarchy, AI-powered analyses of these court proceedings indicate striking gender inequality with women often subjected to domestic violence.
#AI4SG5782
SUSTAINABLESIGNALS: An AI Approach for Inferring Consumer Product Sustainability
The everyday consumption of household goods is a significant source of environmental pollution. The increase of online shopping affords an opportunity to provide consumers with actionable feedback on the social and environmental impact of potential purchases, at the exact moment when it is relevant. Unfortunately, consumers are inundated with ambiguous sustainability information. For example, greenwashing can make it difficult to identify environmentally friendly products. The highest-quality options, such as Life Cycle Assessment (LCA) scores or tailored impact certificates (e.g., environmentally friendly tags), designed for assessing the environmental impact of consumption, are ineffective in the setting of online shopping. They are simply too costly to provide a feasible solution when scaled up, and often rely on data from self-interested market players. We contribute an analysis of this online environment, exploring how the dynamic between sellers and consumers surfaces claims and concerns regarding sustainable consumption. In order to better provide information to consumers, we propose a machine learning method that can discover signals of sustainability from these interactions. Our method, SustainableSignals, is a first step in scaling up the provision of sustainability cues to online consumers.
#AI4SG5791
Machine Learning Driven Aid Classification for Sustainable Development
This paper explores how machine learning can help classify aid activities by sector using the OECD Creditor Reporting System (CRS). The CRS is a key source of data for monitoring and evaluating aid flows in line with the United Nations Sustainable Development Goals (SDGs), especially SDG17 which calls for global partnership and data sharing. To address the challenges of current labor-intensive practices of assigning the code and the related human inefficiencies, we propose a machine learning solution that uses ELECTRA to suggest relevant five-digit purpose codes in CRS for aid activities, achieving an accuracy of 0.9575 for the top-3 recommendations. We also conduct qualitative research based on semi-structured interviews and focus group discussions with SDG experts who assess the model results and provide feedback. We discuss the policy, practical, and methodological implications of our work and highlight the potential of AI applications to improve routine tasks in the public sector and foster partnerships for achieving the SDGs.
#AI4SG5815
Optimization-driven Demand Prediction Framework for Suburban Dynamic Demand-Responsive Transport Systems
Demand-Responsive Transport (DRT) has grown over the last decade as an ecological solution to both metropolitan and suburban areas. It provides a more efficient public transport service in metropolitan areas and satisfies the mobility needs in sparse and heterogeneous suburban areas. Traditionally, DRT operators build the plannings of their drivers by relying on myopic insertion heuristics that do not take into account the dynamic nature of such a service. We thus investigate in this work the potential of a Demand Prediction Framework used specifically to build more flexible routes within a Dynamic Dial-a-Ride Problem (DaRP) solver. We show how to obtain a Machine Learning forecasting model that is explicitly designed for optimization purposes. The prediction task is further complicated by the fact that the historical dataset is significantly sparse. We finally show how the predicted travel requests can be integrated within an optimization scheme in order to compute better plannings at the start of the day. Numerical results support the fact that, despite the data sparsity challenge as well as the optimization-driven constraints that result from the DaRP model, such a look-ahead approach can improve up to 3.5% the average insertion rate of an actual DRT service.
Wednesday 23rd August
17:00-18:00
Demos 2
#DM5680
IMPsys: An Intelligent Mold Processing System for Smart Factory
The explosive popularity of smart manufacturing has caught the attention of researchers in terms of intelligent mold processing and management. Machining mold components is a crucial step in the mold production process for many industries, which creates (e.g., cutting, drilling, and shaping a metal) the individual parts (e.g., core pins, ejector pins, cavities, slides, and lifters) that make up a mold used in manufacturing. We present IMPsys, an AI-based system that automatically explores machining jobs, infers their processing time and schedules them on machines, given numerous 3D modelling files of mold components. Our demo video can be found at: http://bit.ly/3EeKnyL.
#DM5681
Matting Moments: A Unified Data-Driven Matting Engine for Mobile AIGC in Photo Gallery
Image matting is a fundamental technique in visual understanding and has become one of the most significant capabilities in mobile phones. Despite the development of mobile storage and computing power, achieving diverse mobile Artificial Intelligence Generated Content (AIGC) applications remains a great challenge. To address this issue, we present an innovative demonstration of an automatic system called “Matting Moments” that enables automatic image editing based on matting models in different scenarios. Coupled with accurate and refined matting subjects, our system provides visual element editing abilities and backend services for distribution and recommendation that respond to emotional expressions. Our system comprises three components: 1) photo content structuring, 2) data-driven matting engine, and 3) AIGC functions for generation, which automatically achieve diverse photo beautification in the gallery. This system offers a unified framework that guides consumers to obtain intelligent recommendations with beautifully generated contents, helping them enjoy the moments and memories of their present life.
#DM5732
Humming2Music: Being A Composer As Long As You Can Humming
Creating a piece of music is difficult for people who have never been trained to compose. We present an automatic music generation system to lower the threshold of creating music. The system takes the user’s humming as input and creates full music based on the humming melody. The system consists of five modules: 1) humming transcription, 2) melody generation, 3) broken chord generation, 4) accompaniment generation, and 5) audio synthesis. The first module transcribes the user’s humming audio to a score, and then the melody generation module composes a complete melody based on the user’s humming melody. After that, the third module will generate a broken chord track to accompany the full melody, and the fourth module will create more accompanying tracks. Finally, the audio synthesis module mixes all the tracks to generate the music. Through the user experiment, our system can generate high-quality music with natural expression based on the user’s humming input.
#DM5712
LingGe: An Automatic Ancient Chinese Poem-to-Song Generation System
This paper presents a novel system, named LingGe (“伶歌” in Chinese), to generate songs for ancient Chinese poems automatically. LingGe takes the poem as the lyric, composes music conditioned on the lyric, and finally outputs a full song including the singing and the accompaniment. It consists of four modules: rhythm recognition, melody generation, accompaniment generation, and audio synthesis. Firstly, the rhythm recognition module analyzes the song structure and rhythm according to the poem. Secondly, the melody generation module assembles the rhythm into the template and then generates the melody. Thirdly, the accompaniment generation module predicts the accompaniment in harmony with the melody. Finally, the audio synthesis module generates singing and accompaniment audio and then mixes them to obtain songs. The results show that LingGe can generate high-quality and expressive songs for ancient Chinese poems, both in harmony and rhythm.
Thursday 24th August
Thursday 24th August
10:15-11:15
Machine Learning (7/12)
#3686
Incremental and Decremental Optimal Margin Distribution Learning
Incremental and decremental learning (IDL) deals with the tasks where new data arrives sequentially as a stream or old data turns unavailable continually due to the privacy protection. Existing IDL methods mainly focus on support vector machine and its variants with linear-type loss. There are few studies about the quadratic-type loss, whose Lagrange multipliers are unbounded and much more difficult to track. In this paper, we take the latest statistical learning framework optimal margin distribution machine (ODM) which involves a quadratic-type loss due to the optimization of margin variance, for example, and equip it with the ability to handle IDL tasks. Our proposed ID-ODM can avoid updating the Lagrange multipliers in an infinite range by determining their optimal values beforehand so as to enjoy much more efficiency. Moreover, ID-ODM is also applicable when multiple instances come and leave simultaneously. Extensive empirical studies show that ID-ODM can achieve 9.1x speedup on average with almost no generalization lost compared to retraining ODM on new data set from scratch.
#SV5506
Curriculum Graph Machine Learning: A Survey
Graph machine learning has been extensively studied in both academia and industry. However, in the literature, most existing graph machine learning models are designed to conduct training with data samples in a random order, which may suffer from suboptimal performance due to ignoring the importance of different graph data samples and their training orders for the model optimization status. To tackle this critical problem, curriculum graph machine learning (Graph CL), which integrates the strength of graph machine learning and curriculum learning, arises and attracts an increasing amount of attention from the research community. Therefore, in this paper, we comprehensively overview approaches on Graph CL and present a detailed survey of recent advances in this direction. Specifically, we first discuss the key challenges of Graph CL and provide its formal problem definition. Then, we categorize and summarize existing methods into three classes based on three kinds of graph machine learning tasks, i.e., node-level, link-level, and graph-level tasks. Finally, we share our thoughts on future research directions. To the best of our knowledge, this paper is the first survey for curriculum graph machine learning.
#2638
Prediction with Incomplete Data under Agnostic Mask Distribution Shift
Data with missing values is ubiquitous in many applications. Recent years have witnessed increasing attention on prediction with only incomplete data consisting of observed features and a mask that indicates the missing pattern. Existing methods assume that the training and testing distributions are the same, which may be violated in real-world scenarios. In this paper, we consider prediction with incomplete data in the presence of distribution shift. We focus on the case where the underlying joint distribution of complete features and label is invariant, but the missing pattern, i.e., mask distribution may shift agnostically between training and testing. To achieve generalization, we leverage the observation that for each mask, there is an invariant optimal predictor. To avoid the exponential explosion when learning them separately, we approximate the optimal predictors jointly using a double parameterization technique. This has the undesirable side effect of allowing the learned predictors to rely on the intra-mask correlation and that between features and mask. We perform decorrelation to minimize this effect. Combining the techniques above, we propose a novel prediction method called StableMiss. Extensive experiments on both synthetic and real-world datasets show that StableMiss is robust and outperforms state-of-the-art methods under agnostic mask distribution shift.
#2366
Enabling Abductive Learning to Exploit Knowledge Graph
Most systems integrating data-driven machine learning with knowledge-driven reasoning usually rely on a specifically designed knowledge base to enable efficient symbolic inference. However, it could be cumbersome for the nonexpert end-users to prepare such a knowledge base in real tasks. Recent years have witnessed the success of large-scale knowledge graphs, which could be ideal domain knowledge resources for real-world machine learning tasks. However, these large-scale knowledge graphs usually contain much information that is irrelevant to a specific learning task. Moreover, they often contain a certain degree of noise. Existing methods can hardly make use of them because the large-scale probabilistic logical inference is usually intractable. To address these problems, we present ABductive Learning with Knowledge Graph (ABL-KG) that can automatically mine logic rules from knowledge graphs during learning, using a knowledge forgetting mechanism for filtering out irrelevant information. Meanwhile, these rules can form a logic program that enables efficient joint optimization of the machine learning model and logic inference within the Abductive Learning (ABL) framework. Experiments on four different tasks show that ABL-KG can automatically extract useful rules from large-scale and noisy knowledge graphs, and significantly improve the performance of machine learning with only a handful of labeled data.
#2671
Contrastive Label Enhancement
Label distribution learning (LDL) is a new machine learning paradigm for solving label ambiguity. Since it is difficult to directly obtain label distributions, many studies are focusing on how to recover label distributions from logical labels, dubbed label enhancement (LE). Existing LE methods estimate label distributions by simply building a mapping relationship between features and label distributions under the supervision of logical labels. They typically overlook the fact that both features and logical labels are descriptions of the instance from different views. Therefore, we propose a novel method called Contrastive Label Enhancement (ConLE) which integrates features and logical labels into the unified projection space to generate high-level features by contrastive learning strategy. In this approach, features and logical labels belonging to the same sample are pulled closer, while those of different samples are projected farther away from each other in the projection space. Subsequently, we leverage the obtained high-level features to gain label distributions through a well-designed training strategy that considers the consistency of label attributes. Extensive experiments on LDL benchmark datasets demonstrate the effectiveness and superiority of our method.
#1666
Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms
We consider the problem of navigating in a Markov decision process where extrinsic rewards are either absent or ignored. In this setting, the objective is to learn policies to reach all the states that are reachable within a given number of steps (in expectation) from a starting state. We introduce a novel meta-algorithm which can use any online reinforcement learning algorithm (with appropriate regret guarantees) as a black-box. Our algorithm demonstrates a method for transforming the output of online algorithms to a batch setting. We prove an upper bound on the sample complexity of our algorithm in terms of the regret bound of the used black-box RL algorithm. Furthermore, we provide experimental results to validate the effectiveness of our algorithm and correctness of our theoretical results.
Thursday 24th August
10:15-11:15
ML: Applications
#3573
Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance
Gradient Boosting Decision Tree (GBDT) has achieved remarkable success in a wide variety of applications. The split finding algorithm, which determines the tree construction process, is one of the most crucial components of GBDT. However, the split finding algorithm has long been criticized for its bias towards features with a large number of potential splits. This bias introduces severe interpretability and overfitting issues in GBDT. To this end, we provide a fine-grained analysis of bias in GBDT and demonstrate that the bias originates from 1) the systematic bias in the gain estimation of each split and 2) the bias in the split finding algorithm resulting from the use of the same data to evaluate the split improvement and determine the best split. Based on the analysis, we propose unbiased gain, a new unbiased measurement of gain importance using out-of-bag samples. Moreover, we incorporate the unbiased property into the split finding algorithm and develop UnbiasedGBM to solve the overfitting issue of GBDT. We assess the performance of UnbiasedGBM and unbiased gain in a large-scale empirical study comprising 60 datasets and show that: 1) UnbiasedGBM exhibits better performance than popular GBDT implementations such as LightGBM, XGBoost, and Catboost on average on the 60 datasets and 2) unbiased gain achieves better average performance in feature selection than popular feature importance methods.
#2692
A Novel Demand Response Model and Method for Peak Reduction in Smart Grids — PowerTAC
One of the widely used peak reduction methods in smart grids is demand response, where one analyzes the shift in customers’ (agents’) usage patterns in response to the signal from the distribution company. Often, these signals are in the form of incentives offered to agents. This work studies the effect of incentives on the probabilities of accepting such offers in a real-world smart grid simulator, PowerTAC. We first show that there exists a function that depicts the probability of an agent reducing its load as a function of the discounts offered to them. We call it reduction probability (RP). RP function is further parametrized by the rate of reduction (RR), which can differ for each agent. We provide an optimal algorithm, MJS–ExpResponse, that outputs the discounts to each agent by maximizing the expected reduction under a budget constraint. When RRs are unknown, we propose a Multi-Armed Bandit (MAB) based online algorithm, namely MJSUCB–ExpResponse, to learn RRs. Experimentally we show that it exhibits sublinear regret. Finally, we showcase the efficacy of the proposed algorithm in mitigating demand peaks in a real-world smart grid system using the PowerTAC simulator as a test bed.
#4257
Automatic Truss Design with Reinforcement Learning
Truss layout design, namely finding a lightweight truss layout satisfying all the physical constraints, is a fundamental problem in the building industry. Generating the optimal layout is a challenging combinatorial optimization problem, which can be extremely expensive to solve by exhaustive search. Directly applying end-to-end reinforcement learning (RL) methods to truss layout design is infeasible either, since only a tiny portion of the entire layout space is valid under the physical constraints, leading to particularly sparse rewards for RL training.
In this paper, we develop AutoTruss, a two-stage framework to efficiently generate both lightweight and valid truss layouts. AutoTruss first adopts Monte Carlo tree search to discover a diverse collection of valid layouts. Then RL is applied to iteratively refine the valid solutions. We conduct experiments and ablation studies in popular truss layout design test cases in both 2D and 3D settings. AutoTruss outperforms the best-reported layouts by 25.1% in the most challenging 3D test cases, resulting in the first effective deep-RL-based approach in the truss layout design literature.
#2869
Teacher Assistant-Based Knowledge Distillation Extracting Multi-level Features on Single Channel Sleep EEG
Sleep stage classification is of great significance to the diagnosis of sleep disorders. However, existing sleep stage classification models based on deep learning are usually relatively large in size (wider and deeper), which makes them hard to be deployed on wearable devices. Therefore, it is a challenge to lighten the existing sleep stage classification models. In this paper, we propose a novel general knowledge distillation framework for sleep stage classification tasks called SleepKD. Our SleepKD, composed of the multi-level module, teacher assistant module, and other knowledge distillation modules, aims to lighten large-scale sleep stage classification models. Specifically, the multi-level module is able to transfer the multi-level knowledge extracted from sleep signals by the teacher model (large-scale model) to the student model (lightweight model). Moreover, the teacher assistant module bridges the large gap between the teacher and student network, and further improves the distillation. We evaluate our method on two public sleep datasets (Sleep-EDF and ISRUC-III). Compared to the baseline methods, the results show that our knowledge distillation framework achieves state-of-the-art performance. SleepKD can significantly lighten the sleep model while maintaining its classification performance. The source code is available at https://github.com/HychaoWang/SleepKD.
#J5919
Reinforcement Learning from Optimization Proxy for Ride-Hailing Vehicle Relocation (Extended Abstract)
Idle vehicle relocation is crucial for addressing demand-supply imbalance that frequently arises in the ride-hailing system. Current mainstream methodologies – optimization and reinforcement learning – suffer from obvious computational drawbacks. Optimization models need to be solved in real-time and often trade off model fidelity (hence quality of solutions) for computational efficiency. Reinforcement learning is expensive to train and often struggles to achieve coordination among a large fleet. This paper designs a hybrid approach that leverages the strengths of the two while overcoming their drawbacks. Specifically, it trains an optimization proxy, i.e., a machine-learning model that approximates an optimization model, and refines the proxy with reinforcement learning. This Reinforcement Learning from Optimization Proxy (RLOP) approach is efficient to train and deploy, and achieves better results than RL or optimization alone. Numerical experiments on the New York City dataset show that the RLOP approach reduces both the relocation costs and computation time significantly compared to the optimization model, while pure reinforcement learning fails to converge due to computational complexity.
Thursday 24th August
10:15-11:15
ML: Sequence and Graph Learning
#1624
Hierarchical Transformer for Scalable Graph Learning
Graph Transformer is gaining increasing attention in the field of machine learning and has demonstrated state-of-the-art performance on benchmarks for graph representation learning. However, as current implementations of Graph Transformer primarily focus on learning representations of small-scale graphs, the quadratic complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs. Additionally, conventional sampling-based methods fail to capture necessary high-level contextual information, resulting in a significant loss of performance. In this paper, we introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges. HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance. By utilizing graph hierarchies constructed through coarsening techniques, HSGT efficiently updates and stores multi-scale information in node embeddings at different levels. Together with sampling-based training methods, HSGT effectively captures and aggregates multi-level information on the hierarchical graph using only Transformer blocks. Empirical evaluations demonstrate that HSGT achieves state-of-the-art performance on large-scale benchmarks with graphs containing millions of nodes with high efficiency.
#3545
Graph Neural Convection-Diffusion with Heterophily
Graph neural networks (GNNs) have shown promising results across various graph learning tasks, but they often assume homophily, which can result in poor performance on heterophilic graphs. The connected nodes are likely to be from different classes or have dissimilar features on heterophilic graphs. In this paper, we propose a novel GNN that incorporates the principle of heterophily by modeling the flow of information on nodes using the convection-diffusion equation (CDE). This allows the CDE to take into account both the diffusion of information due to homophily and the “convection” of information due to heterophily. We conduct extensive experiments, which suggest that our framework can achieve competitive performance on node classification tasks for heterophilic graphs, compared to the state-of-the-art methods. The code is available at https://github.com/zknus/Graph-Diffusion-CDE.
#1738
Generative Flow Networks for Precise Reward-Oriented Active Learning on Graphs
Many score-based active learning methods have been successfully applied to graph-structured data, aiming to reduce the number of labels and achieve better performance of graph neural networks based on predefined score functions. However, these algorithms struggle to learn policy distributions that are proportional to rewards and have limited exploration capabilities. In this paper, we innovatively formulate the graph active learning problem as a generative process, named GFlowGNN, which generates various samples through sequential actions with probabilities precisely proportional to a predefined reward function. Furthermore, we propose the concept of flow nodes and flow features to efficiently model graphs as flows based on generative flow networks, where the policy network is trained with specially designed rewards. Extensive experiments on real datasets show that the proposed approach has good exploration capability and transferability, outperforming various state-of-the-art methods.
#2716
LSGNN: Towards General Graph Neural Network in Node Classification by Local Similarity
Heterophily has been considered as an issue that hurts the performance of Graph Neural Networks (GNNs). To address this issue, some existing work uses a graph-level weighted fusion of the information of multi-hop neighbors to include more nodes with homophily. However, the heterophily might differ among nodes, which requires to consider the local topology. Motivated by it, we propose to use the local similarity (LocalSim) to learn node-level weighted fusion, which can also serve as a plug-and-play module. For better fusion, we propose a novel and efficient Initial Residual Difference Connection (IRDC) to extract more informative multi-hop information. Moreover, we provide theoretical analysis on the effectiveness of LocalSim representing node homophily on synthetic graphs. Extensive evaluations over real benchmark datasets show that our proposed method, namely Local Similarity Graph Neural Network (LSGNN), can offer comparable or superior state-of-the-art performance on both homophilic and heterophilic graphs. Meanwhile, the plug-and-play model can significantly boost the performance of existing GNNs.
#1111
Violin: Virtual Overbridge Linking for Enhancing Semi-supervised Learning on Graphs with Limited Labels
Graph Neural Networks (GNNs) is a family of promising tools for graph semi-supervised learning. However, in training, most existing GNNs rely heavily on a large amount of labeled data, which is rare in real-world scenarios. Unlabeled data with useful information are usually under-exploited, which limits the representation power of GNNs. To handle these problems, we propose Virtual Overbridge Linking (Violin), a generic framework to enhance the learning capacity of common GNNs. By learning to add virtual overbridges between two nodes that are estimated to be semantic-consistent, labeled and unlabeled data can be correlated. Supervised information can be well utilized in training while simultaneously inducing the model to learn from unlabeled data. Discriminative relation patterns extracted from unlabeled nodes can also be shared with other nodes even if they are remote from each other. Motivated by recent advances in data augmentations, we additionally integrate Violin with the consistency regularized training. Such a scheme yields node representations with better robustness, which significantly enhances a GNN. Violin can be readily extended to a wide range of GNNs without introducing additional learnable parameters. Extensive experiments on six datasets demonstrate that our method is effective and robust under low-label rate scenarios, where Violin can boost some GNNs’ performance by over 10% on node classifications.
#2470
LGI-GT: Graph Transformers with Local and Global Operators Interleaving
Since Transformers can alleviate some critical and fundamental problems of graph neural networks (GNNs), such as over-smoothing, over-squashing and limited expressiveness, they have been successfully applied to graph representation learning and achieved impressive results. However, although there are many works dedicated to make graph Transformers (GTs) aware of the structure and edge information by specifically tailored attention forms or graph-related positional and structural encodings, few works address the problem of how to construct high-performing GTs with modules of GNNs and Transformers. In this paper, we propose a novel graph Transformer with local and global operators interleaving (LGI-GT), in which we further design a new method propagating embeddings of the [CLS] token for global information representation. Additionally, we propose an effective message passing module called edge enhanced local attention (EELA), which makes LGI-GT a full-attention GT. Extensive experiments demonstrate that LGI-GT performs consistently better than previous state-of-the-art GNNs and GTs, while ablation studies show the effectiveness of the proposed LGI scheme and EELA. The source code of LGI-GT is available at https://github.com/shuoyinn/LGI-GT.
Thursday 24th August
10:15-11:15
CV: 3D Computer Vision (1/3)
#536
BPNet: Bézier Primitive Segmentation on 3D Point Clouds
This paper proposes BPNet, a novel end-to-end deep learning framework to learn Bézier primitive segmentation on 3D point clouds. The existing works treat different primitive types separately, thus limiting them to finite shape categories. To address this issue, we seek a generalized primitive segmentation on point clouds. Taking inspiration from Bézier decomposition on NURBS models, we transfer it to guide point cloud segmentation casting off primitive types. A joint optimization framework is proposed to learn Bézier primitive segmentation and geometric fitting simultaneously on a cascaded architecture. Specifically, we introduce a soft voting regularizer to improve primitive segmentation and propose an auto-weight embedding module to cluster point features, making the network more robust and generic. We also introduce a reconstruction module where we successfully process multiple CAD models with different primitives simultaneously. We conducted extensive experiments on the synthetic ABC dataset and real-scan datasets to validate and compare our approach with different baseline methods. Experiments show superior performance over previous work in terms of segmentation, with a substantially faster inference speed.
#2160
Manifold-Aware Self-Training for Unsupervised Domain Adaptation on Regressing 6D Object Pose
Domain gap between synthetic and real data in visual regression (e.g., 6D pose estimation) is bridged in this paper via global feature alignment and local refinement on the coarse classification of discretized anchor classes in target space, which imposes a piece-wise target manifold regularization into domain-invariant representation learning. Specifically, our method incorporates an explicit self-supervised manifold regularization, revealing consistent cumulative target dependency across domains, to a self-training scheme (e.g., the popular Self-Paced Self-Training) to encourage more discriminative transferable representations of regression tasks. Moreover, learning unified implicit neural functions to estimate relative direction and distance of targets to their nearest class bins aims to refine target classification predictions, which can gain robust performance against inconsistent feature scaling sensitive to UDA regressors. Experiment results on three public benchmarks of the challenging 6D pose estimation task can verify the effectiveness of our method, consistently achieving superior performance to the state-of-the-art for UDA on 6D pose estimation. Codes and pre-trained models are available https://github.com/Gorilla-Lab-SCUT/MAST.
#607
Contact2Grasp: 3D Grasp Synthesis via Hand-Object Contact Constraint
3D grasp synthesis generates grasping poses given an input object. Existing works tackle the problem by learning a direct mapping from objects to the distributions of grasping poses. However, because the physical contact is sensitive to small changes in pose, the high-nonlinear mapping between 3D object representation to valid poses is considerably non-smooth, leading to poor generation efficiency and restricted generality. To tackle the challenge, we introduce an intermediate variable for grasp contact areas to constrain the grasp generation; in other words, we factorize the mapping into two sequential stages by assuming that grasping poses are fully constrained given contact maps: 1) we first learn contact map distributions to generate the potential contact maps for grasps; 2) then learn a mapping from the contact maps to the grasping poses. Further, we propose a penetration-aware optimization with the generated contacts as a consistency constraint for grasp refinement. Extensive validations on two public datasets show that our method outperforms state-of-the-art methods regarding grasp generation on various metrics.
#847
Efficient Multi-View Inverse Rendering Using a Hybrid Differentiable Rendering Method
Recovering the shape and appearance of real-world objects from natural 2D images is a long-standing and challenging inverse rendering problem. In this paper, we introduce a novel hybrid differentiable rendering method to efficiently reconstruct the 3D geometry and reflectance of a scene from multi-view images captured by conventional hand-held cameras. Our method follows an analysis-by-synthesis approach and consists of two phases. In the initialization phase, we use traditional SfM and MVS methods to reconstruct a virtual scene roughly matching the real scene. Then in the optimization phase, we adopt a hybrid approach to refine the geometry and reflectance, where the geometry is first optimized using an approximate differentiable rendering method, and the reflectance is optimized afterward using a physically-based differentiable rendering method. Our hybrid approach combines the efficiency of approximate methods with the high-quality results of physically-based methods. Extensive experiments on synthetic and real data demonstrate that our method can produce reconstructions with similar or higher quality than state-of-the-art methods while being more efficient.
#953
APR: Online Distant Point Cloud Registration through Aggregated Point Cloud Reconstruction
For many driving safety applications, it is of great importance to accurately register LiDAR point clouds generated on distant moving vehicles. However, such point clouds have extremely different point density and sensor perspective on the same object, making registration on such point clouds very hard. In this paper, we propose a novel feature extraction framework, called APR, for online distant point cloud registration. Specifically, APR leverages an autoencoder design, where the autoencoder reconstructs a denser aggregated point cloud with several frames instead of the original single input point cloud. Our design forces the encoder to extract features with rich local geometry information based on one single input point cloud. Such features are then used for online distant point cloud registration. We conduct extensive experiments against state-of-the-art (SOTA) feature extractors on KITTI and nuScenes datasets. Results show that APR outperforms all other extractors by a large margin, increasing average registration recall of SOTA extractors by 7.1% on LoKITTI and 4.6% on LoNuScenes. Code is available at https://github.com/liuQuan98/APR.
#2619
StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset
Modeling and capturing the 3D spatial arrangement of the human and the object is the key to perceiving 3D human-object interaction from monocular images. In this work, we propose to use the Human-Object Offset between anchors which are densely sampled from the surface of human mesh and object mesh to represent human-object spatial relation. Compared with previous works which use contact map or implicit distance filed to encode 3D human-object spatial relations, our method is a simple and efficient way to encode the highly detailed spatial correlation between the human and object. Based on this representation, we propose Stacked Normalizing Flow (StackFLOW) to infer the posterior distribution of human-object spatial relations from the image. During the optimization stage, we finetune the human body pose and object 6D pose by maximizing the likelihood of samples based on this posterior distribution and minimizing the 2D-3D corresponding reprojection loss. Extensive experimental results show that our method achieves impressive results on two challenging benchmarks, BEHAVE and InterCap datasets. Our code has been publicly available at https://github.com/MoChen-bop/StackFLOW.
Thursday 24th August
10:15-11:15
CV: Recognition (Object Detection, Categorization) (3/3)
#1585
Domain-Adaptive Self-Supervised Face & Body Detection in Drawings
Drawings are powerful means of pictorial abstraction and communication. Understanding diverse forms of drawings, including digital arts, cartoons, and comics, has been a major problem of interest for the computer vision and computer graphics communities. Although there are large amounts of digitized drawings from comic books and cartoons, they contain vast stylistic variations, which necessitate expensive manual labeling for training domain-specific recognizers. In this work, we show how self-supervised learning, based on a teacher-student network with a modified student network update design, can be used to build face and body detectors. Our setup allows exploiting large amounts of unlabeled data from the target domain when labels are provided for only a small subset of it. We further demonstrate that style transfer can be incorporated into our learning pipeline to bootstrap detectors using a vast amount of out-of-domain labeled images from natural images (i.e., images from the real world). Our combined architecture yields detectors with state-of-the-art (SOTA) and near-SOTA performance using minimal annotation effort. Our code can be accessed from https://github.com/barisbatuhan/DASS_Detector.
#1456
Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and Beyond
Recently, multi-modality scene perception tasks, e.g., image fusion and scene understanding, have attracted widespread attention for intelligent vision systems. However, early efforts always consider boosting a single task unilaterally and neglecting others, seldom investigating their underlying connections for joint promotion. To overcome these limitations, we establish the hierarchical dual tasks-driven deep model to bridge these tasks. Concretely, we firstly construct an image fusion module to fuse complementary characteristics and cascade dual task-related modules, including a discriminator for visual effects and a semantic network for feature measurement.
We provide a bi-level perspective to formulate image fusion and follow-up downstream tasks. To incorporate distinct task-related responses for image fusion, we consider image fusion as a primary goal and dual modules as learnable constraints. Furthermore, we develop an efficient first-order approximation to compute corresponding gradients and present dynamic weighted aggregation to balance the gradients for fusion learning. Extensive experiments demonstrate the superiority of our method, which not only produces visually pleasant fused results but also realizes significant promotion for detection and segmentation than the state-of-the-art approaches.
#3478
RaMLP: Vision MLP via Region-aware Mixing
Recently, MLP-based architectures achieved impressive results in image classification against CNNs and ViTs. However, there is an obvious limitation in that their parameters are related to image sizes, allowing them to process only fixed image sizes. Therefore, they cannot directly adapt dense prediction tasks (e.g., object detection and semantic segmentation) where images are of various sizes. Recent methods tried to address it but brought two new problems, long-range dependencies or important visual cues are ignored. This paper presents a new MLP-based architecture, Region-aware MLP (RaMLP), to satisfy various vision tasks and address the above three problems. In particular, we propose a well-designed module, Region-aware Mixing (RaM). RaM captures important local information and further aggregates these important visual clues. Based on RaM, RaMLP achieves a global receptive field even in one block. It is worth noting that, unlike most existing MLP-based architectures that adopt the same spatial weights to all samples, RaM is region-aware and adaptively determines weights to extract region-level features better. Impressively, our RaMLP outperforms state-of-the-art ViTs, CNNs, and MLPs on both ImageNet-1K image classification and downstream dense prediction tasks, including MS-COCO object detection, MS-COCO instance segmentation, and ADE20K semantic segmentation. In particular, RaMLP outperforms MLPs by a large margin (around 1.5% Apb or 1.0% mIoU) on dense prediction tasks. The training code could be found at https://github.com/xiaolai-sqlai/RaMLP.
#902
Cross-Domain Facial Expression Recognition via Disentangling Identity Representation
Most existing cross-domain facial expression recognition (FER) works require target domain data to assist the model in analyzing distribution shifts to overcome negative effects. However, it is often hard to obtain expression images of the target domain in practical applications. Moreover, existing methods suffer from the interference of identity information, thus limiting the discriminative ability of the expression features. We exploit the idea of domain generalization (DG) and propose a representation disentanglement model to address the above problems. Specifically, we learn three independent potential subspaces corresponding to the domain, expression, and identity information from facial images. Meanwhile, the extracted expression and identity features are recovered as Fourier phase information reconstructed images, thereby ensuring that the high-level semantics of images remain unchanged after disentangling the domain information. Our proposed method can disentangle expression features from expression-irrelevant ones (i.e., identity and domain features). Therefore, the learned expression features exhibit sufficient domain invariance and discriminative ability. We conduct experiments with different settings on multiple benchmark datasets, and the results show that our method achieves superior performance compared with state-of-the-art methods.
#2309
Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables
Recent advanced Table Structure Recognition (TSR) models adopt image-to-text solutions to parse table structure. These methods can be formulated as image caption problem, i.e., input a single-table image and output table structure description in a specific text format, e.g., HTML. With the impressive success of Transformer in text generation tasks, these methods use Transformer architecture to predict HTML table text in an autoregressive manner. However, tables always emerge with a large variety of shapes and sizes. Autoregressive models usually suffer from the error accumulation problem as the length of predicted text increases, which results in unsatisfactory performance for large tables. In this paper, we propose a novel image-to-text based TSR method that relieves error accumulation problems and improves performance noticeably. At the core of our method is a cascaded two-step decoder architecture with the former decoder predicting HTML table row tags non-autoregressively and the latter predicting HTML table cell tags of each row in a semi-autoregressive manner. Compared with existing methods that predict HTML text autoregressively, the superiority of our row-to-cell progressive table parsing is twofold: (1) it generates an HTML tag sequence with a vertical-and-horizontal two-step `scanning’, which better fits the inherent 2D structure of image data, (2) it performs substantially better for large tables (long sequence prediction) since it alleviates error accumulation problem specific to autoregressive models. Extensive experiments demonstrate that our method achieves competitive performance on three public benchmarks.
#194
MILD: Modeling the Instance Learning Dynamics for Learning with Noisy Labels
Despite deep learning has achieved great success, it often relies on a large amount of training data with accurate labels, which are expensive and time-consuming to collect. A prominent direction to reduce the cost is to learn with noisy labels, which are ubiquitous in the real-world applications. A critical challenge for such a learning task is to reduce the effect of network memorization on the falsely-labeled data. In this work, we propose an iterative selection approach based on the Weibull mixture model, which identifies clean data by considering the overall learning dynamics of each data instance. In contrast to the previous small-loss heuristics, we leverage the observation that deep network is easy to memorize and hard to forget clean data. In particular, we measure the difficulty of memorization and forgetting for each instance via the transition times between being misclassified and being memorized in training, and integrate them into a novel metric for selection. Based on the proposed metric, we retain a subset of identified clean data and repeat the selection procedure to iteratively refine the clean subset, which is finally used for model training. To validate our method, we perform extensive experiments on synthetic noisy datasets and real-world web data, and our strategy outperforms existing noisy-label learning methods.
Thursday 24th August
10:15-11:15
Computer Vision (4/6)
#408
Learning Object Consistency and Interaction in Image Generation from Scene Graphs
This paper is concerned with synthesizing images conditioned on a scene graph (SG), a set of object nodes and their edges of interactive relations. We divide existing works into image-oriented and code-oriented methods. In our analysis, the image-oriented methods do not consider object interaction in spatial hidden feature. On the other hand, in empirical study, the code-oriented methods lose object consistency as their generated images miss certain objects in the input scene graph. To alleviate these two issues, we propose Learning Object Consistency and Interaction (LOCI). To preserve object consistency, we design a consistency module with a weighted augmentation strategy for objects easy to be ignored and a matching loss between scene graphs and image codes. To learn object interaction, we design an interaction module consisting of three kinds of message propagation between the input scene graph and the learned image code. Experiments on COCO-stuff and Visual Genome datasets show our proposed method alleviates the ignorance of objects and outperforms the state-of-the-art on visual fidelity of generated images and objects.
#3654
Dual Prompt Learning for Continual Rain Removal from Single Images
Recent efforts have achieved remarkable progress on single image deraining on the stationary distributed data. However, catastrophic forgetting raises practical concerns when applying these methods to real applications, where the data distributions change constantly. In this paper, we investigate the continual learning issue for rain removal and develop a novel efficient continual learned deraining transformer. Different from the typical replay or regularization-based methods that increase overall training time or parameter space, our method relies on compact prompts which are learnable parameters, to maintain both task-invariant and task-specific knowledge. Our prompts are applied at both image and feature levels to leverage effectively transferred knowledge of images and features among different tasks. We conduct comprehensive experiments under widely-used rain removal datasets, where our proposed dual prompt learning consistently outperforms prior state-of-the-art methods. Moreover, we observe that, even though our method is designed for continual learning, it still achieves superior results on the stationary distributed data, which further demonstrates the effectiveness of our method. Our website is available at: http://liuminghao.com.cn/DPL/.
#879
Data Level Lottery Ticket Hypothesis for Vision Transformers
The conventional lottery ticket hypothesis (LTH) claims that there exists a sparse subnetwork within a dense neural network and a proper random initialization method, called the winning ticket, such that it can be trained from scratch to almost as good as the dense counterpart. Meanwhile, the research of LTH in vision transformers (ViTs) is scarcely evaluated. In this paper, we first show that the conventional winning ticket is hard to find at weight level of ViTs by existing methods. Then, we generalize the LTH for ViTs to input data consisting of image patches inspired by the input dependence of ViTs. That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches. We call this subset of input patches the winning tickets, which represent a significant amount of information in the input data. We use a ticket selector to generate the winning tickets based on the informativeness of patches for various types of ViT, including DeiT, LV-ViT, and Swin Transformers. The experiments show that there is a clear difference between the performance of models trained with winning tickets and randomly selected subsets, which verifies our proposed theory. We elaborate the analogical similarity between our proposed Data-LTH-ViTs and the conventional LTH for further verifying the integrity of our theory. The Source codes are available at https://github.com/shawnricecake/vit-lottery-ticket-input.
#2002
A Novel Learnable Interpolation Approach for Scale-Arbitrary Image Super-Resolution
Deep convolutional neural networks (CNNs) have achieved unprecedented success in single image super-resolution over the past few years. Meanwhile, there is an increasing demand for single image super-resolution with arbitrary scale factors in real-world scenarios. Many approaches adopt scale-specific multi-path learning to cope with multi-scale super-resolution with a single network. However, these methods require a large number of parameters. To achieve a better balance between the reconstruction quality and parameter amounts, we proposes a learnable interpolation method that leverages the advantages of neural networks and interpolation methods to tackle the scale-arbitrary super-resolution task. The scale factor is treated as a function parameter for generating the kernel weights for the learnable interpolation. We demonstrate that the learnable interpolation builds a bridge between neural networks and traditional interpolation methods. Experiments show that the proposed learnable interpolation requires much fewer parameters and outperforms state-of-the-art super-resolution methods.
#1953
Tracking Different Ant Species: An Unsupervised Domain Adaptation Framework and a Dataset for Multi-object Tracking
Tracking individuals is a vital part of many experiments conducted to understand collective behaviour. Ants are the paradigmatic model system for such experiments but their lack of individually distinguishing visual features and their high colony densities make it extremely difficult to perform reliable racking automatically. Additionally, the wide diversity of their species’ appearances makes a generalized approach even harder. In this paper, we propose a data-driven multi-object tracker that, for the first time, employs domain adaptation to achieve the required generalisation. This approach is built upon a joint-detection-and-tracking framework that is extended by a set of domain discriminator modules integrating an adversarial training strategy in addition to the tracking loss. In addition to this novel domain-adaptive tracking framework, we present a new dataset and a benchmark for the ant tracking problem. The dataset contains 57 video sequences with full trajectory annotation, including 30k frames captured from two different ant species moving on different background patterns. It comprises 33 and 24 sequences for source and target domains, respectively. We compare our proposed framework against other domain-adaptive and non-domain-adaptive multi-object tracking baselines using this dataset and show that incorporating domain adaptation at multiple levels of the tracking pipeline yields significant improvements. The code and the dataset are available at https://github.com/chamathabeysinghe/da-tracker.
Thursday 24th August
10:15-11:15
KRR: Argumentation
#3935
Ranking-based Argumentation Semantics Applied to Logical Argumentation
In formal argumentation, a distinction can be made between extension-based semantics, where sets of
arguments are either (jointly) accepted or not, and ranking-based semantics, where grades of accept-
ability are assigned to arguments. Another important distinction is that between abstract approaches,
that abstract away from the content of arguments, and structured approaches, that specify a method
of constructing argument graphs on the basis of a knowledge base. While ranking-based semantics
have been extensively applied to abstract argumentation, few work has been done on ranking-based
semantics for structured argumentation. In this paper, we make a systematic investigation into the be-
haviour of ranking-based semantics applied to existing formalisms for structured argumentation. We
show that a wide class of ranking-based semantics gives rise to so-called culpability measures, and
are relatively robust to specific choices in argument construction methods.
#1639
Bipolar Abstract Dialectical Frameworks Are Covered by Kleene’s Three-valued Logic
Abstract dialectical frameworks (ADFs) are one of the most powerful generalizations of classical Dung-style argumentation frameworks (AFs).
The additional expressive power comes with an increase in computational complexity, namely one level up in the polynomial hierarchy in comparison to
their AF counterparts. However, there is one important subclass, so-called bipolar ADFs (BADFs) which are as complex as classical AFs while offering strictly more modeling capacities. This property makes BADFs very attractive from a knowledge representation point of view and is the main reason why this class has received much attention recently. The semantics of ADFs rely on the Gamma-operator which takes as an input a three-valued interpretation and returns a new one. However, in order to obtain the output the original definition requires to consider any two-valued completion of a given three-valued interpretation. In this paper we formally prove that in case of BADFs we may bypass the computationally intensive procedure via applying Kleene’s three-valued logic K. We therefore introduce the so-called bipolar disjunctive normal form which is simply a disjunctive normal form where any used atom possesses either a positive or a negative polarity. We then show that: First, this normal form is expressive enough to represent any BADF and secondly, the computation can be done via Kleene’s K instead of dealing with two-valued completions. Inspired by the main correspondence result we present some first experiments showing the computational benefit of using Kleene.
#797
Leveraging Argumentation for Generating Robust Sample-based Explanations
Explaining predictions made by inductive classifiers has become crucial with the rise of complex models acting more and more as black-boxes.
Abductive explanations are one of the most popular types of explanations that are provided for the purpose. They highlight feature-values that
are sufficient for making predictions. In the literature, they are generated by exploring the whole feature space, which is unreasonable in practice.
This paper solves the problem by introducing explanation functions that generate abductive explanations from a sample of instances. It shows
that such functions should be defined with great care since they cannot satisfy two desirable properties at the same time, namely existence of
explanations for every individual decision (success) and correctness of explanations (coherence). The paper provides a parameterized family of
argumentation-based explanation functions, each of which satisfies one of the two properties. It studies their formal properties and their experimental
behaviour on different datasets.
#4087
Quantitative Reasoning and Structural Complexity for Claim-Centric Argumentation
Argumentation is a well-established formalism for nonmonotonic reasoning and a vibrant area of research in AI. Claim-augmented argumentation frameworks (CAFs) have been introduced to deploy a conclusion-oriented perspective. CAFs expand argumentation frameworks by an additional step which involves retaining claims for an accepted set of arguments. We introduce a novel concept of a justification status for claims, a quantitative measure of extensions supporting a particular claim. The well-studied problems of credulous and skeptical reasoning can then be seen as simply the two endpoints of the spectrum when considered as a justification level of a claim. Furthermore, we explore the parameterized complexity of various reasoning problems for CAFs, including the quantitative reasoning for claim assertions. We begin by presenting a suitable graph representation that includes arguments and their associated claims. Our analysis includes the parameter treewidth, and we present decomposition-guided reductions between reasoning problems in CAF and the validity problem for QBF.
#4431
Preferences and Constraints in Abstract Argumentation
In recent years there has been an increasing interest in extending Dung’s framework to facilitate the knowledge representation and reasoning process.
In this paper, we present an extension of Abstract Argumentation Framework (AF) that allows for the representation of preferences over arguments’ truth values (3-valued preferences).
For instance, we can express a preference stating that extensions where argument a is false (i.e. defeated) are preferred to extensions where argument b is false.
Interestingly, such a framework generalizes the well-known Preference-based AF with no additional cost in terms of computational complexity for most of the classical argumentation semantics.
Then, we further extend AF by considering both (3-valued) preferences and 3-valued constraints, that is constraints of the form \varphi \Rightarrow v or v \Rightarrow \varphi, where \varphi is a logical formula and v is a 3-valued truth value.
After investigating the complexity of the resulting framework,as both constraints and preferences may represent subjective knowledge of agents,
we extend our framework by considering multiple agents and study the complexity of deciding acceptance of arguments in this context.
Thursday 24th August
10:15-11:15
DM: Mining Graphs (2/2)
#1132
Imbalanced Node Classification Beyond Homophilic Assumption
Imbalanced node classification widely exists in real-world networks where graph neural networks (GNNs) are usually highly inclined to majority classes and suffer from severe performance degradation on classifying minority class nodes. Various imbalanced node classification methods have been proposed recently which construct synthetic nodes and edges w.r.t. minority classes to balance the label/topology distribution. However, they are all based on homophilic assumption that nodes of the same label tend to connect despite the widely existence of heterophilic edges in real-world graphs. Thus, they uniformly aggregate features from both homophilic and heterophilic neighbors and rely on feature similarity to generate synthetic edges, which cannot be applied to imbalanced graphs in high heterophily. To address this problem, we propose a novel GraphSANN for imbalanced node classification on both homophilic and heterophilic graphs. Firstly, we propose a unified feature mixer to generate synthetic nodes with both homophilic and heterophilic interpolation in a unified way. Next, by randomly sampling edges between synthetic nodes and existing nodes as candidata edges, we design an adaptive subgraph extractor to dynamically extract the contextual subgraphs of candidate edges with flexible ranges. Finally, we develop a multi-filter subgraph encoder which constructs multiple different filter channels to discriminatively aggregate neighbors’ information along the homophilic and heterophilic edges. Extensive experiments on eight benchmark datasets demonstrate the superiority of our model for imbalanced node classificaiton on both homophilic and heterophilic graphs.
#2466
CONGREGATE: Contrastive Graph Clustering in Curvature Spaces
Graph clustering is a longstanding research topic, and has achieved remarkable success with the deep learning methods in recent years. Nevertheless, we observe that several important issues largely remain open. On the one hand, graph clustering from the geometric perspective is appealing but has rarely been touched before, as it lacks a promising space for geometric clustering. On the other hand, contrastive learning boosts the deep graph clustering but usually struggles in either graph augmentation or hard sample mining. To bridge this gap, we rethink the problem of graph clustering from geometric perspective and, to the best of our knowledge, make the first attempt to introduce a heterogeneous curvature space to graph clustering problem. Correspondingly, we present a novel end-to-end contrastive graph clustering model named CONGREGATE, addressing geometric graph clustering with Ricci curvatures. To support geometric clustering, we construct a theoretically grounded Heterogeneous Curvature Space where deep representations are generated via the product of the proposed fully Riemannian graph convolutional nets. Thereafter, we train the graph clusters by an augmentation-free reweighted contrastive approach where we pay more attention to both hard negatives and hard positives in our curvature space. Empirical results on real-world graphs show that our model outperforms the state-of-the-art competitors.
#1424
Dynamic Group Link Prediction in Continuous-Time Interaction Network
Recently, group link prediction has received increasing attention due to its important role in analyzing relationships between individuals and groups. However, most existing group link prediction methods emphasize static settings or only make cursory exploitation of historical information, so they fail to obtain good performance in dynamic applications. To this end, we attempt to solve the group link prediction problem in continuous-time dynamic scenes with fine-grained temporal information. We propose a novel continuous-time group link prediction method CTGLP to capture the patterns of future link formation between individuals and groups. A new graph neural network CTGNN is presented to learn the latent representations of individuals by biasedly aggregating neighborhood information. Moreover, we design an importance-based group modeling function to model the embedding of a group based on its known members. CTGLP eventually learns a probability distribution and predicts the link target. Experimental results on various datasets with and without unseen nodes show that CTGLP outperforms the state-of-the-art methods by 13.4% and 13.2% on average.
#5148
Intent-aware Recommendation via Disentangled Graph Contrastive Learning
Graph neural network (GNN) based recommender systems have become one of the mainstream trends due to the powerful learning ability from user behavior data. Understanding the user intents from behavior data is the key to recommender systems, which poses two basic requirements for GNN-based recommender systems. One is how to learn complex and diverse intents especially when the user behavior is usually inadequate in reality. The other is different behaviors have different intent distributions, so how to establish their relations for a more explainable recommender system. In this paper, we present the Intent-aware Recommendation via Disentangled Graph Contrastive Learning (IDCL), which simultaneously learns interpretable intents and behavior distributions over those intents. Specifically, we first model the user behavior data as a user-item-concept graph, and design a GNN based behavior disentangling module to learn the different intents. Then we propose the intent-wise contrastive learning to enhance the intent disentangling and meanwhile infer the behavior distributions. Finally, the coding rate reduction regularization is introduced to make the behaviors of different intents orthogonal. Extensive experiments demonstrate the effectiveness of IDCL in terms of substantial improvement and the interpretability.
#1846
Gapformer: Graph Transformer with Graph Pooling for Node Classification
Graph Transformers (GTs) have proved their advantage in graph-level tasks. However, existing GTs still perform unsatisfactorily on the node classification task due to 1) the overwhelming unrelated information obtained from a vast number of irrelevant distant nodes and 2) the quadratic complexity regarding the number of nodes via the fully connected attention mechanism. In this paper, we present Gapformer, a method for node classification that deeply incorporates Graph Transformer with Graph Pooling. More specifically, Gapformer coarsens the large-scale nodes of a graph into a smaller number of pooling nodes via local or global graph pooling methods, and then computes the attention solely with the pooling nodes rather than all other nodes. In such a manner, the negative influence of the overwhelming unrelated nodes is mitigated while maintaining the long-range information, and the quadratic complexity is reduced to linear complexity with respect to the fixed number of pooling nodes. Extensive experiments on 13 node classification datasets, including homophilic and heterophilic graph datasets, demonstrate the competitive performance of Gapformer over existing Graph Neural Networks and GTs.
Thursday 24th August
10:15-11:15
GTEP: Computational Social Choice (1/2)
#4271
Algorithmics of Egalitarian versus Equitable Sequences of Committees
We study the election of sequences of committees, where in each of tau levels (e.g. modeling points in time) a committee consisting of k candidates from a common set of m candidates is selected. For each level, each of n agents (voters) may nominate one candidate whose selection would satisfy her. We are interested in committees which are good with respect to the satisfaction per day and per agent. More precisely, we look for egalitarian or equitable committee sequences. While both guarantee that at least x agents per day are satisfied, egalitarian committee sequences ensure that each agent is satisfied in at least y levels while equitable committee sequences ensure that each agent is satisfied in exactly y levels. We analyze the parameterized complexity of finding such committees for the parameters n, m, k, tau, x, and y, as well as combinations thereof.
#4760
Diversity, Agreement, and Polarization in Elections
We consider the notions of agreement, diversity, and polarization in ordinal elections (that is, in elections where voters rank the candidates). While (computational) social choice offers good measures of agreement between the voters, such measures for the other two notions are lacking. We attempt to rectify this issue by designing appropriate measures, providing means of their (approximate) computation, and arguing that they, indeed, capture diversity and polarization well. In particular, we present “maps of preference orders” that highlight relations between the votes in a given election and which help in making arguments about their nature.
#4526
Convergence in Multi-Issue Iterative Voting under Uncertainty
We study strategic behavior in iterative plurality voting for multiple issues under uncertainty. We introduce a model synthesizing simultaneous multi-issue voting with Meir et al. [2014]’s local dominance theory, in which agents repeatedly update their votes based on sets of vote profiles they deem possible, and determine its convergence properties. After demonstrating that local dominance improvement dynamics may fail to converge, we present two sufficient model refinements that guarantee convergence from any initial vote profile for binary issues: constraining agents to have O-legal preferences, where issues are ordered by importance, and endowing agents with less uncertainty about issues they are modifying than others. Our empirical studies demonstrate that while cycles are common for agents without uncertainty, introducing uncertainty makes convergence almost guaranteed in practice.
#J5946
Learning to Design Fair and Private Voting Rules (Extended Abstract)
Voting is used widely to aggregate preferences to make a collective decision. In this paper, we focus on evaluating and designing voting rules that support both the privacy of the voting agents and a notion of fairness over such agents. First, we introduce a novel notion of group fairness and adopt the existing notion of local differential privacy. We then evaluate the level of group fairness in several existing voting rules, as well as the trade-offs between fairness and privacy, showing that it is not possible to always obtain maximal economic efficiency with high fairness.
Then, we present both a machine learning and a constrained optimization approach to design new voting rules that are fair while maintaining a high level of economic efficiency. Finally, we empirically examine the effect of adding noise to create local differentially private voting rules and discuss the three-way trade-off between economic efficiency, fairness, and privacy.
#4367
Measuring and Controlling Divisiveness in Rank Aggregation
In rank aggregation, members of a population rank issues to decide which are collectively preferred. We focus instead on identifying divisive issues that express disagreements among the preferences of individuals. We analyse the properties of our divisiveness measures and their relation to existing notions of polarisation. We also study their robustness under incomplete preferences and algorithms for control and manipulation of divisiveness. Our results advance our understanding of how to quantify disagreements in collective decision-making.
#1603
Error in the Euclidean Preference Model
Spatial models of preference, in the form of vector embeddings, are learned by many deep learning and multiagent systems, including recommender systems. Often these models are assumed to approximate a Euclidean structure, where an individual prefers alternatives positioned closer to their “ideal point”, as measured by the Euclidean metric. However, previous work has shown there are ordinal preference profiles that cannot be represented with this structure if the Euclidean space has two fewer dimensions than there are individuals or alternatives. We extend this result, showing that there are situations in which almost all preference profiles cannot be represented with the Euclidean model, and derive a theoretical lower bound on the expected error when using the Euclidean model to approximate non-Euclidean preference profiles. Our results have implications for the interpretation and use of vector embeddings, because in some cases close approximation of arbitrary, true ordinal relationships can be expected only if the dimensionality of the embeddings is a substantial fraction of the number of entities represented.
Thursday 24th August
10:15-11:15
Agent-based and Multi-agent Systems (3/4)
#4783
Probabilistic Planning with Prioritized Preferences over Temporal Logic Objectives
This paper studies temporal planning in probabilistic environments, modeled as labeled Markov decision processes (MDPs), with user preferences over multiple temporal goals. Existing works reflect such preferences as a prioritized list of goals. This paper introduces a new specification language, termed prioritized qualitative choice linear temporal logic on finite traces, which augments linear temporal logic on finite traces with prioritized conjunction and ordered disjunction from prioritized qualitative choice logic. This language allows for succinctly specifying temporal objectives with corresponding preferences accomplishing each temporal task. The finite traces that describe the system’s behaviors are ranked based on their dissatisfaction scores with respect to the formula. We propose a systematic translation from the new language to a weighted deterministic finite automaton. Utilizing this computational model, we formulate and solve a problem of computing an optimal policy that minimizes the expected score of dissatisfaction given user preferences. We demonstrate the efficacy and applicability of the logic and the algorithm on several case studies with detailed analyses for each.
#SC18
Half-Positional Objectives Recognized by Deterministic Büchi Automata (Extended Abstract)
In two-player zero-sum games on graphs, the protagonist tries to achieve an objective while the antagonist aims to prevent it. Objectives for which both players do not need to use memory to play optimally are well-understood and characterized both in finite and infinite graphs. Less is known about the larger class of half-positional objectives, i.e., those for which the protagonist does not need memory (but for which the antagonist might). In particular, no characterization of half-positionality is known for the central class of ω-regular objectives.
Here, we characterize objectives recognizable by deterministic Büchi automata (a class of ω-regular objectives) that are half-positional, both over finite and infinite graphs. This characterization yields a polynomial-time algorithm to decide half-positionality of an objective recognized by a given deterministic Büchi automaton.
#1817
Explainable Multi-Agent Reinforcement Learning for Temporal Queries
As multi-agent reinforcement learning (MARL) systems are increasingly deployed throughout society, it is imperative yet challenging for users to understand the emergent behaviors of MARL agents in complex environments. This work presents an approach for generating policy-level contrastive explanations for MARL to answer a temporal user query, which specifies a sequence of tasks completed by agents with possible cooperation. The proposed approach encodes the temporal query as a PCTL* logic formula and checks if the query is feasible under a given MARL policy via probabilistic model checking. Such explanations can help reconcile discrepancies between the actual and anticipated multi-agent behaviors. The proposed approach also generates correct and complete explanations to pinpoint reasons that make a user query infeasible. We have successfully applied the proposed approach to four benchmark MARL domains (up to 9 agents in one domain). Moreover, the results of a user study show that the generated explanations significantly improve user performance and satisfaction.
#399
Asynchronous Communication Aware Multi-Agent Task Allocation
Multi-agent task allocation in physical environments with spatial and temporal constraints, are hard problems that are relevant in many realistic applications. A task allocation algorithm based on Fisher market clearing (FMC_TA), that can be performed either centrally or distributively, has been shown to produce high quality allocations in comparison to both centralized and distributed state of the art incomplete optimization algorithms. However, the algorithm is synchronous and therefore depends on perfect communication between agents.
We propose FMC_ATA, an asynchronous version of FMC_TA, which is robust to message latency and message loss. In contrast to the former version of the algorithm, FMC_ATA allows agents to identify dynamic events and initiate the generation of an updated allocation. Thus, it is more compatible for dynamic environments. We further investigate the conditions in which the distributed version of the algorithm is preferred over the centralized version. Our results indicate that the proposed asynchronous distributed algorithm produces consistent results even when the communication level is extremely poor.
#4520
Discounting in Strategy Logic
Discounting is an important dimension in multi-agent systems as long as we want to reason about strategies and time. It is a key aspect in economics as it captures the intuition that the far-away future is not as important as the near future. Traditional verification techniques allow to check whether there is a winning strategy for a group of agents but they do not take into account the fact that satisfying a goal sooner is different from satisfying it after a long wait.
In this paper, we augment Strategy Logic with future discounting over a set of discounted functions D, denoted SL[D]. We consider “until” operators with discounting functions: the satisfaction value of a specification in SL[D] is a value in [0, 1], where the longer it takes to fulfill requirements, the smaller the satisfaction value is. We motivate our approach with classical examples from Game Theory and study the complexity of model-checking SL[D]-formulas.
#J5937
Data-Driven Revision of Conditional Norms in Multi-Agent Systems (Extended Abstract)
In multi-agent systems, norm enforcement is a mechanism for steering the behavior of individual agents in order to achieve desired system-level objectives. Due to the dynamics of multi-agent systems, however, it is hard to design norms that guarantee the achievement of the objectives in every operating context. Also, these objectives may change over time, thereby making previously defined norms ineffective. In this paper, we investigate the use of system execution data to automatically synthesise and revise conditional prohibitions with deadlines, a type of norms aimed at preventing agents from exhibiting certain patterns of behaviors. We propose DDNR (Data-Driven Norm Revision), a data-driven approach to norm revision that synthesises revised norms with respect to a data set of traces describing the behavior of the agents in the system. We evaluate DDNR using a state-of-the-art, off-the-shelf urban traffic simulator. The results show that DDNR synthesises revised norms that are significantly more accurate than the original norms in distinguishing adequate and inadequate behaviors for the achievement of the system-level objectives.
Thursday 24th August
10:15-11:15
CSO: Satisfiabilty
#3364
Fast Algorithms for SAT with Bounded Occurrences of Variables
We present fast algorithms for the general CNF satisfiability problem (SAT) with running-time bound O*({c_d}^n), where c_d is a function of the maximum occurrence d of variables (d can also be the average occurrence when each variable appears at least twice), and n is the number of variables in the input formula. Similar to SAT with bounded clause lengths, SAT with bounded occurrences of variables has also been extensively studied in the literature. Especially, the running-time bounds for small values of d, such as d=3 and d=4, have become bottlenecks for algorithms evaluated by the formula length L and other algorithms. In this paper, we show that SAT can be solved in time O*(1.1238^n) for d=3 and O*(1.2628^n) for d=4, improving the previous results O*(1.1279^n) and O*(1.2721^n) obtained by Wahlström (SAT 2005) nearly 20 years ago. For d>=5, we obtain a running time bound of O*(1.0641^{dn}), implying a bound of O*(1.0641^L) with respect to the formula length L, which is also a slight improvement over the previous bound.
#SC17
Certified CNF Translations for Pseudo-Boolean Solving
#J5934
Proofs and Certificates for Max-SAT (Extended Abstract)
In this paper, we present a tool, called MS-Builder, which generates certificates for the Max-SAT problem in the particular form of a sequence of equivalence-preserving transformations. To generate a certificate, MS-Builder iteratively calls a SAT oracle to get a SAT resolution refutation which is handled and adapted into a sound refutation for Max-SAT. In particular, the size of the computed Max-SAT refutation is linear with respect to the size of the initial refutation if it is semi-read-once, tree-like regular, tree-like or semi-tree-like. Additionally, we propose an extendable tool, called MS-Checker, able to verify the validity of any Max-SAT certificate using Max-SAT inference rules.
#4078
Co-Certificate Learning with SAT Modulo Symmetries
We present a new SAT-based method for generating all graphs up to isomorphism that satisfy a given co-NP property. Our method extends the SAT Modulo Symmetry (SMS) framework with a technique that we call co-certificate learning. If SMS generates a candidate graph that violates the given co-NP property,
we obtain a certificate for this violation, i.e., `co-certificate’ for the co-NP property. The co-certificate gives rise to a clause that the SAT solver, serving as SMS’s backend, learns as part of its CDCL procedure. We demonstrate that SMS plus co-certificate learning is a powerful method that allows us to improve the best-known lower bound on the size of Kochen-Specker vector systems, a problem that is central to the foundations of quantum mechanics and has been studied for over half a century. Our approach is orders of magnitude faster and scales significantly better than a recently proposed SAT-based method.
#1763
A New Variable Ordering for In-processing Bounded Variable Elimination in SAT Solvers
Bounded Variable Elimination (BVE) is an important Boolean formula simplification technique in which the variable ordering is crucial. We define a new variable ordering based on variable activity, called ESA (variable Elimination Scheduled by Activity), for in-processing BVE in Conflict-Driven Clause Learning (CDCL) SAT solvers, and incorporate it into several state-of-the-art CDCL SAT solvers. Experimental results show that the new ESA ordering consistently makes these solvers solve more instances on the benchmark set including all the 5675 instances used in the Crafted, Application and Main tracks of all SAT Competitions up to 2022. In particular, one of these solvers with ESA, Kissat_MAB_ESA, won the Anniversary track of the SAT Competition 2022. The behaviour of ESA and the reason of its effectiveness are also analyzed.
#J5942
SAT Encodings for Pseudo-Boolean Constraints Together With At-Most-One Constraints (Extended Abstract)
When solving a combinatorial problem using propositional satisfiability (SAT), the encoding of the constraints is of vital importance.
Pseudo-Boolean (PB) constraints appear frequently in a wide variety of problems. When PB constraints occur together with at-most-one (AMO) constraints over the same variables, they can be combined into PB(AMO) constraints.
In this paper we present new encodings for PB(AMO) constraints.
Our experiments show that these encodings can be substantially smaller than those of PB constraints and allow many more instances to be solved within a time limit.
We also observed that there is no single overall winner among the considered encodings, but efficiency of each encoding may depend on PB(AMO) characteristics such as the magnitude of coefficient values.
Thursday 24th August
10:15-11:15
AI for Social Good – Ethics, trust, fairness
#AI4SG5761
CGS: Coupled Growth and Survival Model with Cohort Fairness
Fish modeling in complex environments is critical for understanding drivers of population dynamics in aquatic systems. This paper proposes a Bayesian network method for modeling fish survival and growth over multiple connected rivers. Traditional fish survival models capture the effect of multiple environmental drivers (e.g., stream temperature, stream flow) by adding different variables, which increases model complexity and results in very long and impractical run times (i.e., weeks). We propose a coupled survival-growth model that leverages the observations from both sources simultaneously. It also integrates the Bayesian process into the neural network model to efficiently capture complex variable relationships in the system while also conforming to known survival processes used in existing fish models. To further reduce the performance disparity of fish body length across cohorts, we propose two approaches for enforcing fairness by the adjustment of training priorities and data augmentation. The results based on a real-world fish dataset collected in Massachusetts, US demonstrate that the proposed method can greatly improve prediction accuracy in modeling survival and body length compared to independent models on survival and growth, and effectively reduce the performance disparity across cohorts. The fish growth and movement patterns discovered by the proposed model are also consistent with prior studies in the same region, while vastly reducing run times and memory requirements.
#AI4SG5795
Addressing Weak Decision Boundaries in Image Classification by Leveraging Web Search and Generative Models
Machine learning (ML) technologies are known to be riddled with ethical and operational problems, however, we are witnessing an increasing thrust by businesses to deploy them in sensitive applications. One major issue among many is that ML models do not perform equally well for underrepresented groups. This puts vulnerable populations in an even disadvantaged and unfavorable position. We propose an approach that leverages the power of web search and generative models to alleviate some of the shortcomings of discriminative models. We demonstrate our method on an image classification problem using ImageNet’s People Subtree subset, and show that it is effective in enhancing robustness and mitigating bias in certain classes that represent vulnerable populations (e.g., female doctor of color). Our new method is able to (1) identify weak decision boundaries for such classes; (2) construct search queries for Google as well as text for generating images through DALL-E 2 and Stable Diffusion; and (3) show how these newly captured training samples could alleviate population bias issue. While still improving the model’s overall performance considerably, we achieve a significant reduction (77.30%) in the model’s gender accuracy disparity. In addition to these improvements, we observed a notable enhancement in the classifier’s decision boundary, as it is characterized by fewer weakspots and an increased separation between classes. Although we showcase our method on vulnerable populations in this study, the proposed technique is extendable to a wide range of problems and domains.
#AI4SG5817
Toward Job Recommendation for All
This paper presents a job recommendation algorithm designed and validated in the context of the French Public Employment Service. The challenges, owing to the confidential data policy, are related with the extreme sparsity of the interaction matrix and the mandatory scalability of the algorithm, aimed to deliver recommendations to millions of job seekers in quasi real-time, considering hundreds of thousands of job ads. The experimental validation of the approach shows similar or better performances than the state of the art in terms of recall, with a gain in inference time of 2 orders of magnitude. The study includes some fairness analysis of the recommendation algorithm. The gender-related gap is shown to be statistically similar in the true data and in the counter-factual data built from the recommendations.
#AI4SG5859
For Women, Life, Freedom: Social Web Analyses of a Watershed Moment of Iran’s Gender Struggles
In this paper, we present a computational analysis of the Persian language Twitter discourse with the aim to estimate the shift in stance toward gender equality following the death of Mahsa Amini in police custody. We present an ensemble active learning pipeline to train a stance classifier. Our novelty lies in the involvement of Iranian women in an active role as annotators in building this AI system. Our annotators not only provide labels, but they also suggest valuable keywords for more meaningful corpus creation as well as provide short example documents for a guided sampling step. Our analyses indicate that Mahsa Amini’s death triggered polarized Persian language discourse where both fractions of negative and positive tweets toward gender equality increased. The increase in positive tweets was slightly greater than the increase in negative tweets. We also observe that with respect to account creation time, between the state-aligned Twitter accounts and pro-protest Twitter accounts, pro-protest accounts are more similar to baseline Persian Twitter activity.
#AI4SG5422
Fast and Differentially Private Fair Clustering
This study presents the first differentially private and fair clustering method, built on the recently proposed density-based fair clustering approach. The method addresses the limitations of fair clustering algorithms that necessitate the use of sensitive personal information during training or inference phases. Two novel solutions, the Gaussian mixture density function and Voronoi cell, are proposed to enhance the method’s performance in terms of privacy, fairness, and utility compared to previous methods. The experimental results on both synthetic and real-world data confirm the compatibility of the proposed method with differential privacy, achieving a better fairness-utility trade-off than existing methods when privacy is not considered. Moreover, the proposed method requires significantly less computation time, being at least 3.7 times faster than the state-of-the-art.
Thursday 24th August
11:45-12:45
Machine Learning (8/12)
#2339
Generalized Discriminative Deep Non-Negative Matrix Factorization Based on Latent Feature and Basis Learning
As a powerful tool for data representation, deep NMF has attracted much attention in recent years. Current deep NMF builds the multi-layer structure by decomposing either basis matrix or feature matrix into multiple factors, and probably complicates the learning process when data is insufficient or exhibits simple structure. To overcome the limitations, a novel method called Generalized Deep Non-negative Matrix Factorization (GDNMF) is proposed, which generalizes several NMF and deep NMF methods in a unified framework. GDNMF simultaneously performs decomposition on both features and bases, which learns a hierarchical data representation based on multi-level basis. To further improve the latent representation and enhance its flexibility, GDNMF mutually reinforces shallow linear model and deep non-linear model. Moreover, semi-supervised GDNMF is proposed by treating partial label information as soft constraints in the multi-layer structure. An efficient two-phase optimization algorithm is developed, and experiments on five real-world datesets verify its superior performance compared with state-of-the-art methods.
#SV5563
Towards Utilitarian Online Learning — A Review of Online Algorithms in Open Feature Space
Human intelligence comes from the capability to describe and make sense of the world surrounding us, often in a lifelong manner. Online Learning (OL) allows a model to simulate this capability, which involves processing data in sequence, making predictions, and learning from predictive errors. However, traditional OL assumes a fixed set of features to describe data, which can be restrictive. In reality, new features may emerge and old features may vanish or become obsolete, leading to an open
feature space. This dynamism can be caused by more advanced or outdated technology for sensing the world, or it can be a natural process of evolution. This paper reviews recent breakthroughs that strived to enable OL in open feature spaces, referred to as Utilitarian Online Learning (UOL). We taxonomize existing UOL models into three categories, analyze their pros and cons, and discuss their application scenarios. We also benchmark the performance of representative UOL models, highlighting open problems, challenges, and potential future directions of this emerging topic.
#990
ProMix: Combating Label Noise via Maximizing Clean Sample Utility
Learning with Noisy Labels (LNL) has become an appealing topic, as imperfectly annotated data are relatively cheaper to obtain. Recent state-of-the-art approaches employ specific selection mechanisms to separate clean and noisy samples and then apply Semi-Supervised Learning (SSL) techniques for improved performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. To fulfill this, we propose a novel LNL framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high confidence selection technique that selects those examples with high confidence scores and matched predictions with given labels to dynamically expand a base clean sample set. To overcome the potential side effect of excessive clean set selection procedure, we further devise a novel SSL framework that is able to train balanced and unbiased classifiers on the separated clean and noisy samples. Extensive experiments demonstrate that ProMix significantly advances the current state-of-the-art results on multiple benchmarks with different types and levels of noise. It achieves an average improvement of 2.48% on the CIFAR-N dataset.
#1604
A Fast Adaptive Randomized PCA Algorithm
It is desirable to adaptively determine the number of dimensions (rank) for PCA according to a given tolerance of low-rank approximation error. In this work, we aim to develop a fast algorithm solving this adaptive PCA problem. We propose to replace the QR factorization in randQB_EI algorithm with matrix multiplication and inversion of small matrices, and propose a new error indicator to incrementally evaluate approximation error in Frobenius norm. Combining the shifted power iteration technique for better accuracy, we finally build up an algorithm named farPCA. Experimental results show that farPCA is much faster than the baseline methods (randQB_EI, randUBV and svds) in practical setting of multi-thread computing, while producing nearly optimal results of adpative PCA.
#1252
Latent Processes Identification From Multi-View Time Series
Understanding the dynamics of time series data typically requires identifying the unique latent factors for data generation, a.k.a., latent processes identification. Driven by the independent assumption, existing works have made great progress in handling single-view data. However, it is a non-trivial problem that extends them to multi-view time series data because of two main challenges: (i) the complex data structure, such as temporal dependency, can result in violation of the independent assumption; (ii) the factors from different views are generally overlapped and are hard to be aggregated to a complete set. In this work, we propose a novel framework MuLTI that employs the contrastive learning technique to invert the data generative process for enhanced identifiability. Additionally, MuLTI integrates a permutation mechanism that merges corresponding overlapped variables by the establishment of an optimal transport formula. Extensive experimental results on synthetic and real-world datasets demonstrate the superiority of our method in recovering identifiable latent variables on multi-view time series. The code is available on https://github.com/lccurious/MuLTI.
#SV5484
Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities
Graph neural networks have emerged as a leading architecture for many graph-level tasks, such as graph classification and graph generation. As an essential component of the architecture, graph pooling is indispensable for obtaining a holistic graph-level representation of the whole graph. Although a great variety of methods have been proposed in this promising and fast-developing research field, to the best of our knowledge, little effort has been made to systematically summarize these works. To set the stage for the development of future works, in this paper, we attempt to fill this gap by providing a broad review of recent methods for graph pooling. Specifically, 1) we first propose a taxonomy of existing graph pooling methods with a mathematical summary for each category; 2) then, we provide an overview of the libraries related to graph pooling, including the commonly used datasets, model architectures for downstream tasks, and open-source implementations; 3) next, we further outline the applications that incorporate the idea of graph pooling in a variety of domains; 4) finally, we discuss certain critical challenges facing current studies and share our insights on future potential directions for research on the improvement of graph pooling.
Thursday 24th August
11:45-12:45
Machine Learning (9/12)
#SC8
Harnessing Neighborhood Modeling and Asymmetry Preservation for Digraph Representation Learning
#2757
Bayesian Optimization with Switching Cost: Regret Analysis and Lookahead Variants
Bayesian Optimization (BO) has recently received increasing attention due to its efficiency in optimizing expensive-to-evaluate functions. For some practical problems, it is essential to consider the path-dependent switching cost between consecutive sampling locations given a total traveling budget. For example, when using a drone to locate cracks in a building wall or search for lost survivors in the wild, the search path needs to be efficiently planned given the limited battery power of the drone. Tackling such problems requires a careful cost-benefit analysis of candidate locations and balancing exploration and exploitation. In this work, we formulate such a problem as a constrained Markov Decision Process (MDP) and solve it by proposing a new distance-adjusted multi-step look-ahead acquisition function, the distUCB, and using rollout approximation. We also provide a theoretical regret analysis of the distUCB-based Bayesian optimization algorithm. In addition, the empirical performance of the proposed algorithm is tested based on both synthetic and real data experiments, and it shows that our cost-aware non-myopic algorithm performs better than other popular alternatives.
#4090
Singularformer: Learning to Decompose Self-Attention to Linearize the Complexity of Transformer
Transformers achieve excellent performance in a variety of domains since they can capture long-distance dependencies through the self-attention mechanism. However, self-attention is computationally costly due to its quadratic complexity and high memory consumption. In this paper, we propose a novel Transformer variant (Singularformer) that uses neural networks to learn the singular value decomposition process of the attention matrix to design a linear-complexity and memory-efficient global self-attention mechanism. Specifically, we decompose the attention matrix into the product of three matrix factors based on singular value decomposition and design neural networks to learn these matrix factors, then the associative law of matrix multiplication is used to linearize the calculation of self-attention. The above procedure allows us to compute self-attention as two-dimensional reduction processes in the first and second token dimensional spaces, followed by a multi-head self-attention computational process on the first dimensional reduced token features. Experimental results on 8 real-world datasets demonstrate that Singularformer performs favorably against the other Transformer variants with lower time and space complexity. Our source code is publicly available at https://github.com/CSUBioGroup/Singularformer.
#3667
ActUp: Analyzing and Consolidating tSNE and UMAP
TSNE and UMAP are popular dimensionality reduction algorithms due to their speed and interpretable low-dimensional embeddings. Despite their popularity, however, little work has been done to study their full span of differences. We theoretically and experimentally evaluate the space of parameters in the TSNE and UMAP algorithms and observe that a single one — the normalization — is responsible for switching between them. This, in turn, implies that a majority of the algorithmic differences can be toggled without affecting the embeddings. We discuss the implications this has on several theoretic claims behind UMAP, as well as how to reconcile them with existing TSNE interpretations.
Based on our analysis, we provide a method (GDR) that combines previously incompatible techniques from TSNE and UMAP and can replicate the results of either algorithm. This allows our method to incorporate further improvements, such as an acceleration that obtains either method’s outputs faster than UMAP. We release improved versions of TSNE, UMAP, and GDR that are fully plug-and-play with the traditional libraries.
#100
Understanding the Generalization Ability of Deep Learning Algorithms: A Kernelized Rényi’s Entropy Perspective
Recently, information-theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient / Langevin descent (SGD/SGLD) learning algorithms without strong assumptions such as Lipschitz or convexity conditions. However, the current generalization error bounds within this framework are still far from optimal, while substantial improvements on these bounds are quite challenging due to the intractability of high-dimensional information quantities. To address this issue, we first propose a novel information theoretical measure: kernelized Rényi’s entropy, by utilizing operator representation in Hilbert space. It inherits the properties of Shannon’s entropy and can be effectively calculated via simple random sampling, while remaining independent of the input dimension. We then establish the generalization error bounds for SGD/SGLD under kernelized Rényi’s entropy, where the mutual information quantities can be directly calculated, enabling evaluation of the tightness of each intermediate step. We show that our information-theoretical bounds depend on the statistics of the stochastic gradients evaluated along with the iterates, and are rigorously tighter than the current state-of-the-art (SOTA) results. The theoretical findings are also supported by large-scale empirical studies.
#SC25
Sancus: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks (Extended Abstract)
Graph neural networks (GNNs) have emerged due to their success at modeling graph data. Yet, it is challenging for GNNs to efficiently scale to large graphs. Thus, distributed GNNs come into play. To avoid communication caused by expensive data movement between workers, we propose SANCUS, a staleness-aware communication-avoiding decentralized GNN system. By introducing a set of novel bounded embedding staleness metrics and adaptively skipping broadcasts, SANCUS abstracts decentralized GNN processing as sequential matrix multiplication and uses historical embeddings via cache. Theoretically, we show bounded approximation errors of embeddings and gradients with convergence guarantee. Empirically, we evaluate SANCUS with common GNN models via different system setups on large-scale benchmark datasets. Compared to SOTA works, SANCUS can avoid up to 74% communication with at least 1:86_ faster throughput on average without accuracy loss.
Thursday 24th August
11:45-12:45
ML: Federated Learning (2/3)
#648
HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning
Federated learning (FL) collaboratively models user data in a decentralized way. However, in the real world, non-identical and independent data distributions (non-IID) among clients hinder the performance of FL due to three issues, i.e., (1) the class statistics shifting, (2) the insufficient hierarchical information utilization, and (3) the inconsistency in aggregating clients. To address the above issues, we propose HyperFed which contains three main modules, i.e., hyperbolic prototype Tammes initialization (HPTI), hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly, HPTI in the server constructs uniformly distributed and fixed class prototypes, and shares them with clients to match class statistics, further guiding consistent feature representation for local clients. Secondly, HPL in each client captures the hierarchical information in local data with the supervision of shared class prototypes in the hyperbolic model space. Additionally, CA in the server mitigates the impact of the inconsistent deviations from clients to server. Extensive studies of four datasets prove that HyperFed is effective in enhancing the performance of FL under the non-IID setting.
#2045
FedDWA: Personalized Federated Learning with Dynamic Weight Adjustment
Different from conventional federated learning, personalized federated learning (PFL) is able to train a customized model for each individual client according to its unique requirement. The mainstream approach is to adopt a kind of weighted aggregation method to generate personalized models, in which weights are determined by the loss value or model parameters among different clients. However, such kinds of methods require clients to download others’ models. It not only sheer increases communication traffic but also potentially infringes data privacy. In this paper, we propose a new PFL algorithm called FedDWA (Federated Learning with Dynamic Weight Adjustment) to address the above problem, which leverages the parameter server (PS) to compute personalized aggregation weights based on collected models from clients. In this way, FedDWA can capture similarities between clients with much less communication overhead. More specifically, we formulate the PFL problem as an optimization problem by minimizing the distance between personalized models and guidance models, so as to customize aggregation weights for each client. Guidance models are obtained by the local one-step ahead adaptation on individual clients. Finally, we conduct extensive experiments using five real datasets and the results demonstrate that FedDWA can significantly reduce the communication traffic and achieve much higher model accuracy than the state-of-the-art approaches.
#3308
Competitive-Cooperative Multi-Agent Reinforcement Learning for Auction-based Federated Learning
Auction-based Federated Learning (AFL) enables open collaboration among self-interested data consumers and data owners. Existing AFL approaches cannot manage the mutual influence among multiple data consumers competing to enlist data owners. Moreover, they cannot support a single data owner to join multiple data consumers simultaneously. To bridge these gaps, we propose the Multi-Agent Reinforcement Learning for AFL (MARL-AFL) approach to steer data consumers to bid strategically
towards an equilibrium with desirable overall system characteristics. We design a temperature-based reward reassignment scheme to make tradeoffs between cooperation and competition among AFL data consumers. In this way, it can reach an equilibrium state that ensures individual data consumers can achieve good utility, while preserving system-level social welfare. To circumvent potential collusion behaviors among data consumers, we introduce a bar agent to set a personalized bidding
lower bound for each data consumer. Extensive experiments on six commonly adopted benchmark datasets show that MARL-AFL is significantly more advantageous compared to six state-of-the-art approaches, outperforming the best by 12.2%, 1.9% and 3.4% in terms of social welfare, revenue and accuracy, respectively.
#SV5554
Bayesian Federated Learning: A Survey
Federated learning (FL) demonstrates its advantages in integrating distributed infrastructure, communication, computing and learning in a privacy-preserving manner. However, the robustness and capabilities of existing FL methods are challenged by limited and dynamic data and conditions, complexities including heterogeneities and uncertainties, and analytical explainability. Bayesian federated learning (BFL) has emerged as a promising approach to address these issues. This survey presents a critical overview of BFL, including its basic concepts, its relations to Bayesian learning in the context of FL, and a taxonomy of BFL from both Bayesian and federated perspectives. We categorize and discuss client- and server-side and FL-based BFL methods and their pros and cons. The limitations of the existing BFL methods and the future directions of BFL research further address the intricate requirements of real-life FL applications.
#4651
Modeling with Homophily Driven Heterogeneous Data in Gossip Learning
Training deep learning models on data distributed and local to edge devices such as mobile phones is a prominent recent research direction. In a Gossip Learning (GL) system, each participating device maintains a model trained on its local data and iteratively aggregates it with the models from its neighbours in a communication network. While the fully distributed operation in GL comes with natural advantages over the centralized orchestration in Federated Learning (FL), its convergence becomes particularly slow when the data distribution is heterogeneous and aligns with the clustered structure of the communication network. These characteristics are pervasive across practical applications as people with similar interests (thus producing similar data) tend to create communities.
This paper proposes a data-driven neighbor weighting strategy for aggregating the models: this enables faster diffusion of knowledge across the communities in the network and leads to quicker convergence. We augment the method to make it computationally efficient and fair: the devices quickly converge to the same model. We evaluate our model on real and synthetic datasets that we generate using a novel generative model for communication networks with heterogeneous data. Our exhaustive empirical evaluation verifies that our proposed method attains a faster convergence rate than the baselines. For example, the median test accuracy for a decentralized bird image classifier application reaches 81% with our proposed method within 80 rounds, whereas the baseline only reaches 46%.
#1219
Federated Graph Semantic and Structural Learning
Federated graph learning collaboratively learns a global graph neural network with distributed graphs, where the non-independent and identically distributed property is one of the major challenge. Most relative arts focus on traditional distributed tasks like images and voices, incapable of the graph structures. This paper firstly reveals that local client distortion is brought by both node-level semantics and graph-level structure. First, for node-level semantic, we find that contrasting nodes from distinct classes is beneficial to provide a well-performing discrimination. We pull the local node towards the global node of the same class and push them away from the global node of different classes. Second, we postulate that a well-structural graph neural network possesses similarity for neighbors due to the inherent adjacency relationships. However, aligning each node with adjacent nodes hinders discrimination due to the potential class inconsistency. We transform the adjacency relationships into the similarity distribution and leverage the global model to distill the relation knowledge into the local model, which preserves the structural information and discriminability of the local model. Empirical results on three graph datasets manifest the superiority of the proposed method over counterparts.
Thursday 24th August
11:45-12:45
CV: 3D Computer Vision (2/3)
#1762
Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training
Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for both 2D and 3D computer vision. However, existing MAE-style methods can only learn from the data of a single modality, i.e., either images or point clouds, which neglect the implicit semantic and geometric correlation between 2D and 3D. In this paper, we explore how the 2D modality can benefit 3D masked autoencoding, and propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training. Joint-MAE randomly masks an input 3D point cloud and its projected 2D images, and then reconstructs the masked information of the two modalities. For better cross-modal interaction, we construct our JointMAE by two hierarchical 2D-3D embedding modules, a joint encoder, and a joint decoder with modal-shared and model-specific decoders. On top of this, we further introduce two cross-modal strategies to boost the 3D representation learning, which are local-aligned attention mechanisms for 2D-3D semantic cues, and a cross-reconstruction loss for 2D-3D geometric constraints. By our pre-training paradigm, Joint-MAE achieves superior performance on multiple downstream tasks, e.g., 92.4% accuracy for linear SVM on ModelNet40 and 86.07% accuracy on the hardest split of ScanObjectNN.
#429
VGOS: Voxel Grid Optimization for View Synthesis from Sparse Inputs
Neural Radiance Fields (NeRF) has shown great success in novel view synthesis due to its state-of-the-art quality and flexibility. However, NeRF requires dense input views (tens to hundreds) and a long training time (hours to days) for a single scene to generate high-fidelity images. Although using the voxel grids to represent the radiance field can significantly accelerate the optimization process, we observe that for sparse inputs, the voxel grids are more prone to overfitting to the training views and will have holes and floaters, which leads to artifacts. In this paper, we propose VGOS, an approach for fast (3-5 minutes) radiance field reconstruction from sparse inputs (3-10 views) to address these issues. To improve the performance of voxel-based radiance field in sparse input scenarios, we propose two methods: (a) We introduce an incremental voxel training strategy, which prevents overfitting by suppressing the optimization of peripheral voxels in the early stage of reconstruction. (b) We use several regularization techniques to smooth the voxels, which avoids degenerate solutions. Experiments demonstrate that VGOS achieves state-of-the-art performance for sparse inputs with super-fast convergence. Code will be available at https://github.com/SJoJoK/VGOS.
#345
OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking
Two-stage point-to-box network acts as a critical role in the recent popular 3D Siamese tracking paradigm, which first generates proposals and then predicts corresponding proposal-wise scores. However, such a network suffers from tedious hyper-parameter tuning and task misalignment, limiting the tracking performance. Towards these concerns, we propose a simple yet effective one-stage point-to-box network for point cloud-based 3D single object tracking. It synchronizes 3D proposal generation and center-ness score prediction by a parallel predictor without tedious hyper-parameters. To guide a task-aligned score ranking of proposals, a center-aware focal loss is proposed to supervise the training of the center-ness branch, which enhances the network’s discriminative ability to distinguish proposals of different quality. Besides, we design a binary target classifier to identify target-relevant points. By integrating the derived classification scores with the center-ness scores, the resulting network can effectively suppress interference proposals and further mitigate task misalignment. Finally, we present a novel one-stage Siamese tracker OSP2B equipped with the designed network. Extensive experiments on challenging benchmarks including KITTI and Waymo SOT Dataset show that our OSP2B achieves leading performance with a considerable real-time speed.
#3675
CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo
The core of Multi-view Stereo(MVS) is the matching process among reference and source pixels. Cost aggregation plays a significant role in this process, while previous methods focus on handling it via CNNs. This may inherit the natural limitation of CNNs that fail to discriminate repetitive or incorrect matches due to limited local receptive fields. To handle the issue, we aim to involve Transformer into cost aggregation. However, another problem may occur due to the quadratically growing computational complexity caused by Transformer, resulting in memory overflow and inference latency. In this paper, we overcome these limits with an efficient Transformer-based cost aggregation network, namely CostFormer. The Residual Depth-Aware Cost Transformer(RDACT) is proposed to aggregate long-range features on cost volume via self-attention mechanisms along the depth and spatial dimensions. Furthermore, Residual Regression Transformer(RRT) is proposed to enhance spatial attention. The proposed method is a universal plug-in to improve learning-based MVS methods.
Thursday 24th August
11:45-12:45
CV: Transfer, Low-shot, Semi- and Un- supervised Learning
#2748
VS-Boost: Boosting Visual-Semantic Association for Generalized Zero-Shot Learning
Unlike conventional zero-shot learning (CZSL) which only focuses on the recognition of unseen classes by using the classifier trained on seen classes and semantic embeddings, generalized zero-shot learning (GZSL) aims at recognizing both the seen and unseen classes, so it is more challenging due to the extreme training imbalance. Recently, some feature generation methods introduce metric learning to enhance the discriminability of visual features. Although these methods achieve good results, they focus only on metric learning in the visual feature space to enhance features and ignore the association between the feature space and the semantic space. Since the GZSL method uses semantics as prior knowledge to migrate visual knowledge to unseen classes, the consistency between visual space and semantic space is critical. To this end, we propose relational metric learning which can relate the metrics in the two spaces and make the distribution of the two spaces more consistent. Based on the generation method and relational metric learning, we proposed a novel GZSL method, termed VS-Boost, which can effectively boost the association between vision and semantics. The experimental results demonstrate that our method is effective and achieves significant gains on five benchmark datasets compared with the state-of-the-art methods.
#162
Decoupling with Entropy-based Equalization for Semi-Supervised Semantic Segmentation
Semi-supervised semantic segmentation methods are the main solution to alleviate the problem of high annotation consumption in semantic segmentation. However, the class imbalance problem makes the model favor the head classes with sufficient training samples, resulting in poor performance of the tail classes. To address this issue, we propose a Decoupled Semi-Supervise Semantic Segmentation (DeS4) framework based on the teacher-student model. Specifically, we first propose a decoupling training strategy to split the training of the encoder and segmentation decoder, aiming at a balanced decoder. Then, a non-learnable prototype-based segmentation head is proposed to regularize the category representation distribution consistency and perform a better connection between the teacher model and the student model. Furthermore, a Multi-Entropy Sampling (MES) strategy is proposed to collect pixel representation for updating the shared prototype to get a class-unbiased head. We conduct extensive experiments of the proposed DeS4 on two challenging benchmarks (PASCAL VOC 2012 and Cityscapes) and achieve remarkable improvements over the previous state-of-the-art methods.
#660
Semi-supervised Domain Adaptation via Prototype-based Multi-level Learning
In semi-supervised domain adaptation (SSDA), a few labeled target samples of each class help the model to transfer knowledge representation from the fully labeled source domain to the target domain. Many existing methods ignore the benefits of making full use of the labeled target samples from multi-level. To make better use of this additional data, we propose a novel Prototype-based Multi-level Learning (ProML) framework to better tap the potential of labeled target samples. To achieve intra-domain adaptation, we first introduce a pseudo-label aggregation based on the intra-domain optimal transport to help the model align the feature distribution of unlabeled target samples and the prototype. At the inter-domain level, we propose a cross-domain alignment loss to help the model use the target prototype for cross-domain knowledge transfer. We further propose a dual consistency based on prototype similarity and linear classifier to promote discriminative learning of compact target feature representation at the batch level. Extensive experiments on three datasets, including DomainNet, VisDA2017, and Office-Home, demonstrate that our proposed method achieves state-of-the-art performance in SSDA. Our code is available at https://github.com/bupt-ai-cz/ProML.
#2038
LION: Label Disambiguation for Semi-supervised Facial Expression Recognition with Progressive Negative Learning
Semi-supervised deep facial expression recognition (SS-DFER) has recently attracted rising research interest due to its more practical setting of abundant unlabeled data. However, there are two main problems unconsidered in current SS-DFER methods: 1) label ambiguity, i.e., given labels mismatch with facial expressions; 2) inefficient utilization of unlabeled data with low-confidence. In this paper, we propose a novel SS-DFER method, including a Label DIsambiguation module and a PrOgressive Negative Learning module, namely LION, to simultaneously address both problems. Specifically, the label disambiguation module operates on labeled data, including data with accurate labels (clear data) and ambiguous labels (ambiguous data). It first uses clear data to calculate prototypes for all the expression classes, and then re-assign a candidate label set to all the ambiguous data. Based on the prototypes and the candidate label set, the ambiguous data can be relabeled more accurately. As for unlabeled data with low-confidence, the progressive negative learning module is developed to iteratively mine more complete complementary labels, which can guide the model to reduce the association between data and corresponding complementary labels. Experiments on three challenging datasets show that our method significantly outperforms the current state-of-the-art approaches in SS-DFER and surpasses fully-supervised baselines. Code will be available at https://github.com/NUM-7/LION.
#621
Compositional Zero-Shot Artistic Font Synthesis
Recently, many researchers have made remarkable achievements in the field of artistic font synthesis, with impressive glyph style and effect style in the results. However, due to less exploration in style disentanglement, it is difficult for existing methods to envision a kind of unseen style (glyph-effect) compositions of artistic font, and thus can only learn the seen style compositions. To solve this problem, we propose a novel compositional zero-shot artistic font synthesis gan (CAFS-GAN), which allows the synthesis of unseen style compositions by exploring the visual independence and joint compatibility of encoding semantics between glyph and effect. Specifically, we propose two contrast-based style encoders to achieve style disentanglement due to glyph and effect intertwining in the image. Meanwhile, to preserve more glyph and effect detail, we propose a generator based on hierarchical dual styles AdaIN to reorganize content-styles representations from structure to texture gradually. Extensive experiments demonstrate the superiority of our model in generating high-quality artistic font images with unseen style compositions against other state-of-the-art methods. The source code and data is available at moonlight03.github.io/CAFS-GAN/.
Thursday 24th August
11:45-12:45
Computer Vision (5/6)
#3352
Detecting Adversarial Faces Using Only Real Face Self-Perturbations
Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniques achieve high accuracy in detecting some specific adversarial faces (adv-faces), new attack methods especially GAN-based attacks with completely different noise patterns circumvent them and reach a higher attack success rate. Even worse, existing techniques require attack data before implementing the defense, making it impractical to defend newly emerging attacks that are unseen to defenders. In this paper, we investigate the intrinsic generality of adv-faces and propose to generate pseudo adv-faces by perturbing real faces with three heuristically designed noise patterns. We are the first to train an adv-face detector using only real faces and their self-perturbations, agnostic to victim facial recognition systems, and agnostic to unseen attacks. By regarding adv-faces as out-of-distribution data, we then naturally introduce a novel cascaded system for adv-face detection, which consists of training data self-perturbations, decision boundary regularization, and a max-pooling-based binary classifier focusing on abnormal local color aberrations. Experiments conducted on LFW and CelebA-HQ datasets with eight gradient-based and two GAN-based attacks validate that our method generalizes to a variety of unseen adversarial attacks.
#435
ViT-CX: Causal Explanation of Vision Transformers
Despite the popularity of Vision Transformers (ViTs) and eXplainable AI (XAI), only a few explanation methods have been designed specially for ViTs thus far. They mostly use attention weights of the [CLS] token on patch embeddings and often produce unsatisfactory saliency maps. This paper proposes a novel method for explaining ViTs called ViT-CX. It is based on patch embeddings, rather than attentions paid to them, and their causal impacts on the model output. Other characteristics of ViTs such as causal overdetermination are considered in the design of ViT-CX. The empirical results show that ViT-CX produces more meaningful saliency maps and does a better job revealing all important evidence for the predictions than previous methods. The explanation generated by ViT-CX also shows significantly better faithfulness to the model. The codes and appendix are available at https://github.com/vaynexie/CausalX-ViT.
#1137
Robust Image Ordinal Regression with Controllable Image Generation
Image ordinal regression has been mainly studied along the line of exploiting the order of categories. However, the issues of class imbalance and category overlap that are very common in ordinal regression were largely overlooked. As a result, the performance on minority categories is often unsatisfactory. In this paper, we propose a novel framework called CIG based on controllable image generation to directly tackle these two issues. Our main idea is to generate extra training samples with specific labels near category boundaries, and the sample generation is biased toward the less-represented categories. To achieve controllable image generation, we seek to separate structural and categorical information of images based on structural similarity, categorical similarity, and reconstruction constraints. We evaluate the effectiveness of our new CIG approach in three different image ordinal regression scenarios. The results demonstrate that CIG can be flexibly integrated with off-the-shelf image encoders or ordinal regression models to achieve improvement, and further, the improvement is more significant for minority categories.
#2479
U-Match: Two-view Correspondence Learning with Hierarchy-aware Local Context Aggregation
Local context capturing has become the core factor for achieving leading performance in two-view correspondence learning. Recent advances have devised various local context extractors whereas typically adopting explicit neighborhood relation modeling that is restricted and inflexible. To address this issue, we introduce U-Match, an attentional graph neural network that has the flexibility to enable implicit local context awareness at multiple levels. Specifically, a hierarchy-aware graph representation (HAGR) module is designed and fleshed out by local context pooling and unpooling operations. The former encodes local context by adaptively sampling a set of nodes to form a coarse-grained graph, while the latter decodes local context by recovering the coarsened graph back to its original size. Moreover, an orthogonal fusion module is proposed for the collaborative use of HAGR module, which integrates complementary local and global information into compact feature representations without redundancy. Extensive experiments on different visual tasks prove that our method significantly surpasses the state-of-the-arts. In particular, U-Match attains an AUC at 5 degree threshold of 60.53% on the challenging YFCC100M dataset without RANSAC, outperforming the strongest prior model by 8.61 absolute percentage points. Our code is publicly available at https://github.com/ZizhuoLi/U-Match.
#2612
DFVSR: Directional Frequency Video Super-Resolution via Asymmetric and Enhancement Alignment Network
Recently, techniques utilizing frequency-based methods have gained significant attention, as they exhibit exceptional restoration capabilities for detail and structure in video super-resolution tasks. However, most of these frequency-based methods mainly have three major limitations: 1) insufficient exploration of object motion information, 2) inadequate enhancement for high-fidelity regions, and 3) loss of spatial information during convolution. In this paper, we propose a novel network, Directional Frequency Video Super-Resolution (DFVSR), to address these limitations. Specifically, we reconsider object motion from a new perspective and propose Directional Frequency Representation (DFR), which not only borrows the property of frequency representation of detail and structure information but also contains the direction information of the object motion that is extremely significant in videos. Based on this representation, we propose a Directional Frequency-Enhanced Alignment (DFEA) to use double enhancements of task-related information for ensuring the retention of high-fidelity frequency regions to generate the high-quality alignment feature. Furthermore, we design a novel Asymmetrical U-shaped network architecture to progressively fuse these alignment features and output the final output. This architecture enables the intercommunication of the same level of resolution in the encoder and decoder to achieve the supplement of spatial information. Powered by the above designs, our method achieves superior performance over state-of-the-art models on both quantitative and qualitative evaluations.
#3959
Orion: Online Backdoor Sample Detection via Evolution Deviance
Widely-used DNN models are vulnerable to backdoor attacks, where the backdoored model is only triggered by specific inputs but can maintain a high prediction accuracy on benign samples. Existing backdoor input detection strategies rely on the assumption that benign and poisoned samples are separable in the feature representation of the model. However, such an assumption can be broken by advanced feature-hidden backdoor attacks. In this paper, we propose a novel detection framework, dubbed Orion (online backdoor sample detection via evolution deviance). Specifically, we analyze how predictions evolve during a forward pass and find deviations between the shallow and deep outputs of the backdoor inputs. By introducing side nets to track such evolution divergence, Orion eliminates the need for the assumption of latent separability. Additionally, we put forward a scheme to restore the original label of backdoor samples, enabling more robust predictions. Extensive experiments on six attacks, three datasets, and two architectures verify the effectiveness of Orion. It is shown that Orion outperforms state-of-the-art defenses and can identify feature-hidden attacks with an F1-score of 90%, compared to 40% for other detection schemes. Orion can also achieve 80% label recovery accuracy on basic backdoor attacks.
Thursday 24th August
11:45-12:45
Game Theory and Economic Paradigms (2/2)
#1231
Rainbow Cycle Number and EFX Allocations: (Almost) Closing the Gap
Recently, some studies on the fair allocation of indivisible goods notice a connection between a purely combinatorial problem called the Rainbow Cycle problem and a fairness notion known as $\efx$: assuming that the rainbow cycle number for parameter $d$ (i.e. $\rainbow(d)$) is $O(d^\beta \log^\gamma d)$, we can find a $(1-\epsilon)$-$\efx$ allocation with $O_{\epsilon}(n^{\frac{\beta}{\beta+1}}\log^{\frac{\gamma}{\beta +1}} n)$ number of discarded goods \cite{chaudhury2021improving}.
The best upper bound on $\rainbow(d)$ is improved in a series of works to $O(d^4)$ \cite{chaudhury2021improving}, $O(d^{2+o(1)})$ \cite{berendsohn2022fixed}, and finally to $O(d^2)$ \cite{Akrami2022}.\footnote{We refer to the footnote at the end of the introduction for a short note on the result of \cite{Akrami2022}.} Also, via a simple observation, we have $\rainbow(d) \in \Omega(d)$ \cite{chaudhury2021improving}.
In this paper, we introduce another problem in extremal combinatorics. For a parameter $\ell$, we define the rainbow path degree and denote it by $\ech(\ell)$. We show that any lower bound on $\ech(\ell)$ yields an upper bound on $\rainbow(d)$. Next, we prove that $\ech(\ell) \in \Omega(\ell^2/\log n)$ which yields an almost tight upper bound of $\rainbow(d) \in \Omega(d \log d)$. This in turn proves the existence of $(1-\epsilon)$-$\efx$ allocation with $O_{\epsilon}(\sqrt{n \log n})$ number of discarded goods.
In addition, for the special case of the Rainbow Cycle problem that the edges in each part form a permutation, we improve the upper bound to $\rainbow(d) \leq 2d-4$. We leverage $\ech(\ell)$ to achieve this bound.
Our conjecture is that the exact value of $\ech(\ell) $ is $ \lfloor
\frac{\ell^2}{2} \rfloor -1$. We provide some experiments that support this conjecture. Assuming this conjecture is correct, we have $\rainbow(d) \in \Theta(d)$.
#997
Outsourcing Adjudication to Strategic Jurors
We study a scenario where an adjudication task (e.g., the resolution of a binary dispute) is outsourced to a set of agents who are appointed as jurors. This scenario is particularly relevant in a Web3 environment, where no verification of the adjudication outcome is possible, and the appointed agents are, in principle, indifferent to the final verdict. We consider simple adjudication mechanisms that use (1) majority voting to decide the final verdict and (2) a payment function to reward the agents with the majority vote and possibly punish the ones in the minority. Agents interact with such a mechanism strategically: they exert some effort to understand how to properly judge the dispute and cast a yes/no vote that depends on this understanding and on information they have about the rest of the votes. Eventually, they vote so that their utility (i.e., their payment from the mechanism minus the cost due to their effort) is maximized. Under reasonable assumptions about how an agent’s effort is related to her understanding of the dispute, we show that appropriate payment functions can be used to recover the correct adjudication outcome with high probability. Our findings follow from a detailed analysis of the induced strategic game and make use of both theoretical arguments and simulation experiments.
#4290
Measuring a Priori Voting Power in Liquid Democracy
We introduce new power indices to measure the a priori voting power of voters in liquid democracy elections where an underlying network restricts delegations. We argue that our power indices are natural extensions of the standard Penrose-Banzhaf index in simple voting games.
We show that computing the criticality of a voter is #P-hard even in weighted games with weights polynomially-bounded in the size of the instance.
However, for specific settings, such as when the underlying network is a bipartite or complete graph, recursive formulas can compute these indices for weighted voting games in pseudo-polynomial time.
We highlight their theoretical properties and provide numerical results to illustrate how restricting the possible delegations can alter voters’ voting power.
#3930
Matchings under One-Sided Preferences with Soft Quotas
Assigning applicants to posts in the presence of the preferences of applicants and quotas associated with posts is extensively investigated. For a post, lower quota guarantees, and upper quota limits the number of applicants assigned to it. Typically, quotas are assumed to be fixed, which need not be the case in practice. We address this by introducing a soft quota setting, in which every post is associated with two values – lower target and upper target which together denote a range for the intended number of applicants in any assignment. Unlike the fixed quota setting, we allow the number of applicants assigned to a post to fall outside the range. This leads to assignments with deviation. Here, we study the problem of computing an assignment that has two orthogonal optimization objectives – minimizing the deviation (maximum or total) w.r.t. soft quotas and ensuring optimality w.r.t. preferences of applicants (rank-maximality or fairness). The order in which these objectives are considered, the different possibilities to optimize deviation combined with the well-studied notions of optimality w.r.t. preferences open up a range of optimization problems of practical importance. We present efficient algorithms based on flow-networks to solve these optimization problems.
#4804
Incentive-Compatible Selection for One or Two Influentials
Selecting influentials in networks against strategic manipulations has attracted many researchers’ attention and it also has many practical applications. Here, we aim to select one or two influentials in terms of progeny (the influential power) and prevent agents from manipulating their edges (incentive compatibility). The existing studies mostly focused on selecting a single influential for this setting. Zhang et al. [2021] studied the problem of selecting one agent and proved an upper bound of 1/(1+ln2) to approximate the optimal selection. In this paper, we first design a mechanism to actually reach the bound. Then, we move this forward to choosing two agents and propose a mechanism to achieve an approximation ratio of (3+ln2)/(4(1+ln2)) (approx. 0.54).
Thursday 24th August
11:45-12:45
MAS: Multi-agent Learning (2/2)
#2451
Deep Hierarchical Communication Graph in Multi-Agent Reinforcement Learning
Sharing intentions is crucial for efficient cooperation in communication-enabled multi-agent reinforcement learning. Recent work applies static or undirected graphs to determine the order of interaction. However, the static graph is not general for complex cooperative tasks, and the parallel message-passing update in the undirected graph with cycles cannot guarantee convergence. To solve this problem, we propose Deep Hierarchical Communication G