The proposed work aims at proposing a alternative kernel decomposition in the context of kernel machines with indefinite kernels. The original paper of KSVM (SVM in \krein spaces) uses the eigen-decomposition, our proposition avoids this decompostion. We explain how it can help in designing an algorithm that won't require to compute the full kernel matrix. Finally we illustrate the good behavior of the proposed method compared to KSVM.

Adversarial examples are a challenging open problem for deep neural networks. We propose in this paper to add a penalization term that forces the decision function to be flat in some regions of the input space, such that it becomes, at least locally, less sensitive to attacks. Our proposition is theoretically motivated and shows on a first set of carefully conducted experiments that it behaves as expected when used alone, and seems promising when coupled with adversarial training.

We propose an improvement over the algorithm named KSVM, which solves SVM with indefinite kernels. The new algorithm is referred as TrIK-SVM. Its main advantage compared to KSVM is to produce sparse solutions, similarly to SVM with posititive definite kernels. From this new algorithm, we derive an application to the case of multiple kernel learning. In the MKL setting, all candidate kernels has to be positive definite and they are gathered using a convex linear combiantion, such that the final kernel is also positive definite. Here we propose to take advantage of TrIK-SVM to propose a new MKL approach withou any positivity constraints: any symmmetric kernel linearly combined with positive or negative coefficients.

This talk will be about Generative Adversarial Networks(GANs), and a recently introduced dataset, the Cerema AWP(Adverse Weather Pedestrian). We want to assess the capacity of GANs to generate a particular element, in our case a pedestrian, at a specified place. The cerema AWP database is a good database for that task since for each image we have the bounding box of the pedestrian. The Cerema AWP dataset is an image dataset that was produced in a special installation, a tunnel in which different weather condition can be artificially created. And since that database was originally created for pedestrian detection, there is on each image a pedestrian. And the dataset is annotated according to the weather (10 different weathers), the pedestrian (5 different), their clothes (each pedestrian appears with two different clothes). Additional information such as the pedestrian’s direction or the bounding box of the pedestrian is available. The controlled environment, and those detailed information make this database attractive for our purpose. Indeed the background being fixed, it seems to be a simpler version of the problem we would get with different backgrounds, perspectives or other uncontrolled variations. In the cerema AWP database, most of the variation being controlled and associated with labels, we can study the generation, with all the conditions or according to a subset of weather or other conditions. In a previous study using a standard GAN, generated images presented a mixture of weather on a single output, showing that the generative network had trouble matching the dataset distribution. This problem was solved using a conditioning on the weather. Now the generated images have a uniform weather but a problem persists: we don’t have pedestrians on images. We are going to present the architectures, the ways of conditioning and others tricks to help the generator focus on the generation of pedestrians while generating realistic images.

This paper presents the recently published Cerema AWP (Adverse Weather Pedestrian) dataset for various machine learning tasks and its exports in machine learning friendly format. We explain why this dataset can be interesting (mainly because it is a greatly controlled and fully annotated image dataset) and present baseline results for various tasks. Moreover, we decided to follow the very recent suggestions ofdatasheets for dataset, trying to standardize all the available information of the dataset, with a transparency objective.

This special issue of Neurocomputing presents 16 original articles that are extended versions of selected papers from the 24th European Symposium on Artificial Neural Networks (ESANN), a major event for researchers in the fields of artificial neural networks, machine learning, computational intelligence, and related topics. This single track conference occurs annually in Bruges, Belgium, a UNESCO World Heritage Site by one of the most beautiful medieval city centers in Europe. It is organised in collaboration by UCL (Université Catholique de Louvain—Louvain-la-Neuve) and KULeuven (Katholiek Universiteit—Leuven) and is steered by Prof. Michel Verleysen from UCL. In addition to regular sessions, the conference welcomes special sessions organised by renowned scientists in their respective fields. These sessions focus on particular topics, such as medical applications, physics, deep learning, indefinite proximity learning, information visualisation, incremental learning and advances in learning with kernels. The contributions in this special issue show that ESANN covers a broad range of topics in neural computing and neuroscience, from theoretical aspects to state-of-the-art applications. More than 120 researchers from 20 countries participated in the 24th ESANN in April 2016. Around 100 oral and poster communications have been presented this year. Based on the reviewers’ and special session organisers’ recommendations, as well as on the quality of the oral presentations at the conference, a number of authors were invited to submit an extended version of their conference paper for this special issue of Neurocomputing. All extended manuscripts were thoroughly reviewed once more by at least two independent experts and the 16 articles presented in this volume were accepted for publication.

In the presented experimental study, we compare the classification power of two variations of the same graph kernel. One variation is designed to produce semi-definite positive kernel matrices (K_{matching}) and is an approximation of the other one, which is indefinite (K_{max}). We show that using adaptated tools to deal with indefiniteness (KSVM), the original indefinite kernel outperforms its positive definite approximate version. We also propose a slight improvement of the KSVM method, which produces non sparse solutions, by adding a fast post-processing step that gives a sparser solution.

This paper presents a theoretical foundation for an SVM solver in KreĬn spaces. Up to now, all methods are based either on the matrix correction, or on non-convex minimization, or on feature-space embedding. Here we justify and evaluate a solution that uses the original (indefinite) similarity measure, in the original KreĬn space. This solution is the result of a stabilization procedure. We establish the correspondence between the stabilization problem (which has to be solved) and a classical SVM based on minimization (which is easy to solve). We provide simple equations to go from one to the other (in both directions). This link between stabilization and minimization problems is the key to obtain a solution in the original KreĬn space. Using KSVM, one can solve SVM with usually troublesome kernels (large negative eigenvalues or large numbers of negative eigenvalues). We show experiments showing that our algorithm KSVM outperforms all previously proposed approaches to deal with indefinite matrices in SVM-like kernel methods.

The recent technological progress contributes to a huge increase of 3D models available in digital forms. Numerous applications were developed to deal with this amount of information, especially for 3D shape retrieval. One of the main issues is to break the semantic gap between shapes desired by users and shapes returned by retrieval methods. In this paper, we propose an algorithm to address this issue. First the user gives a semantic request. Second, a fuzzy 3D-shape generator sketches out suitable 3D-shapes. Those shapes are filtered by the user or a learning machine to select the one that match the semantic query. Then, we use a state-of-the-art retrieval method to return real-world 3D shapes that match this semantic query. We present results from an experiment. Three semantic concepts are learned and 3D shapes from SHREC'07 database that match each concept are retrieved using our algorithm. The result are good and promising.

This paper proposes the adaptation of Support Vector Data Description (SVDD) to the multiple kernel case (MK-SVDD), based on SimpleMKL. It also introduces a variant called Slim-MK-SVDD that is able to produce a tighter frontier around the data. For the sake of comparison, the equivalent methods are also developed for One-Class SVM, known to be very similar to SVDD for certain shapes of kernels. Those algorithms are illustrated in the context of 3D-shapes filtering and outliers detection. For the 3D-shapes problem, the objective is to be able to select a sub-category of 3D-shapes, each sub-category being learned with our algorithm in order to create a filter. For outliers detection, we apply the proposed algorithms for unsupervised outliers detection as well as for the supervised case.

We propose a generic framework to handle missing weak classifiers at testing stage in a boosted cascade. The main contribution is a probabilistic formulation of the cascade structure that considers the uncertainty introduced by missing weak classifiers. This new formulation involves two problems: (1) the approximation of posterior probabilities on each level and (2) the computation of thresholds on these probabilities to make a decision. Both problems are studied, and several solutions are proposed and evaluated. The method is then applied to two popular computer vision applications: detecting occluded faces and detecting faces in a pose different than the one learned. Experimental results are provided using conventional databases to evaluate the proposed strategies related to basic ones.

We consider the problem of generating learning data within the context of active learning in classification. First, we recall theoretical results proposing discrepancy as a criterion for generating sample in regression. We show that theoretical results about low discrepancy sequences in regression problems are not adequate for classification problems. Secondly we give a theoretical argument in favour of dispersion as a criterion for generating data. Then, we present numerical experiments which have a good degree of adequacy with this theory.

Learning SVM with non positive kernels is is a problem that has been addressed in the last years but it is not really solved : indeed, either the kernel iscorrected(as a pre-treatment or via a modified learning scheme), either it is used with some well-chosen parameters that lead to almost positive-definite kernels. In this work, we aim at solving the actual problem induced by non positive kernels,i.e.solving the stabilization system in the space associated with the non-positive kernel. We first describe this stabilization system, then we expose a simple algorithm based on the eigen-decomposition of the kernel matrix. While providing satisfying solutions, the proposed algorithm shows limitations in terms of memory storage and computational effort. The direct resolution is still an open question.

Nous nous intéressons à la manière de générer un nombre fini de points d’apprentissage dans le cadre d’un apprentissage actif. Ce problème se rencontre dans divers contextes tels que l’optimisation de méta-modèles en ingénierie, l’approximation de fonctions, ou la modélisation d’interactions entre variables sans connaissances a priori. Il est bien connu que plus les données sont nombreuses, plus la modélisation est de bonne qualité. Cependant obtenir des données peut être coûteux ou destructif notamment en chimie, géologie, biologie ou simulations informatiques. Supposons que l’expérimentateur n’ait la possibilité d’obtenir qu’uniquement n points d’apprentissage. Comment doit-il générer son plan d’expérience sans a priori ? La solution consiste à répartir les points d’une manière uniforme sur tout l’espace d’étude. Mais comment générer des points uniformes ? Cette question qui paraît anodine au premier regard et en fait assez complexe. Une solution naturelle consiste à générer une suite de points de manière aléatoire. Une autre solu tion consiste à utiliser une grille régulière. Mais le problème demeure entier lorsque le nombre de points à notre disposition n’est pas adéquat pour une telle répartition .

Nous souhaitons générer des bases d'exemples adaptées aux problèmes d'approximation de formes (classification), sans avoir de connaissance a priori sur la forme à apprendre. Nous montrons tout d'abord que les résultats théoriques privilégiant les suites à discrépance faible pour les problèmes de régression sont inadaptés aux problèmes de classification. Nous donnons des arguments théoriques et des résultats de simulation montrant que c'est la dispersion des points d'apprentissage qui est le critère pertinent à minimiser pour optimiser les performances de l'apprentissage en classification.

For a child, learning is a complex process that consists of acquiring or developing certain competences on the basis of multiple experiences. For a machine, this learning process can be reduced to examples or observations that are used to improve performance. Machine learning can be seen as the optimization of criteria defined on examples. The higher the number of examples, the better the learning process. In terms of optimization, this learning process includes several specific problems. How are the criteria to be optimized defined? How to manage large amounts of data? Which algorithm is efficient in this context? When dealing with those problems, neural networks approaches suggest the usage of non-convex criteria associated to gradient descent methods. This procedure leads to several difficulties linked to the non-convex criteria. The key to the success of kernel based methods (obtained about a decade after the introduction of neural networks) is their capacity to express the learning problem as a large scale quadratic programming problem (convex). Kernel based methods often lead to sparse solutions, i.e. a large number of their components equal zero. Based on this particularity, learning algorithms can solve large scale problems in reasonable time. Solving this type of problem currently takes about one day of calculating when using a mono-processor for 8 million unknowns. Among these 8 million unknowns only 8,000 to 20,000 variables do not equal zero depending on the complexity of the problem.

Apprendre, pour un enfant, est un processus complexe qui vise à acquérir ou à développer certaines facultés à partir d'expériences diverses. Pour une machine, cela se réduit à utiliser des exemples ou des observations pour améliorer ses performances. Ainsi, l'apprentissage automatique peut se concevoir comme l'optimisation, dans un ensemble suffisamment grand, d'un critère défini à partir d'exemples ; plus il y aura d'exemples, meilleur devra être l'apprentissage. Cette manière de voir pose, du point de vue de l'optimisation, des problèmes spécifiques : comment définir le critère à optimiser, comment gérer la masse des données, comment trouver un algorithme efficace dans ce cadre ? Pour traiter ces problèmes, les méthodes d'apprentissage de type réseaux de neurones proposent d'utiliser des critères non convexes associés à des méthodes de descente de gradient. Ce procédé entraîne de nombreuses difficultés pratiques liées à la non convexité du critère à optimiser. L'une des clés du succès des méthodes à noyaux, acquis une dizaine d'années après l'introduction des réseaux de neurones, est leur capacité à formuler le problème d'apprentissage comme un problème de programmation quadratique (convexe) de grande taille et très particulier. En effet, avec les méthodes à noyaux, la solution recherchée est souvent << parcimonieuse >> : un très grand nombre de ses composants sont nuls. Profitant aujourd'hui de cette spécificité, les algorithmes d'apprentissage peuvent résoudre en un temps raisonnable des problèmes de programmation quadratique de très grande taille : 1 journée de calcul en moyenne avec une machine mono-processeur pour 8 millions d'inconnues. Sur ces 8 millions d'inconnues, seules de l'ordre de 8000 à 20000 sont non nulles en fonction de la complexité du problème.

This paper focuses on the use of Support Vector Machines (SVM) when learning data located on incomplete grids. We identify here two typical behaviours to be avoided, that we call holes. Holes are regions of the space with no training data where the decision changes. We propose a novel algorithm which aims at preventing holes to appear. It automatically selects the local kernel bandwidth during training. We provide hard-margin and soft-margin versions and several experimental results. Even though our method is designed for a specific application, it turns out that it can be applied to more general problems.

In a recently published paper in JMLR, coresvm, present an algorithm for SVM called Core Vector Machines (CVM) and illustrate its performances through comparisons with other SVM solvers. After reading the CVM paper we were surprised by some of the reported results. In order to clarify the matter, we decided to reproduce some of the experiments. It turns out that to some extent, our results contradict those reported. Reasons of these different behaviors are given through the analysis of the stopping criterion.

In this chapter we present the combination of two approaches to build a very large SVM. The first method from [?] proposes a strategy for handling invariances in SVMs. It is based on the well-known idea that small deformation of examples should not change their class. The deformed samples are selected or discarded during learning (it is selective sampling). The second approach is LASVM. It is an efficient online algorithm (each training point is only seen once) that also uses selective sampling. We present state-of-the-art results obtained on a handwritten digit recognition problem with 8 millions points on a single processor. This work also demonstrates that online SVMs can effectively handle really large databases.

This paper presents the ν-SVM and the ν-SVR full regularization paths along with a leave-one-out inspired stopping criterion and an efficient implementation. In the ν-SVR method, two parameters are provided by the user: the regularization parameterCand ν which settles the width of the ε-tube. In the classical ν-SVM method, parameter ν is an lower bound on the number of support vectors in the solution. Based on the previous works, extensions of regularization paths for SVM and SVR are proposed and permit to automatically compute the solution path by varying ν or the regularization parameter.

This article deals with the problem of affective states recognition from physical and physiological wearable sensors. Given the complex nature of the relationship between available signals and affective states to be detected we propose to use a statistical learning method. We begin with a discussion about the state of the art in the field of statistical learning algorithms and their application to affective states recognition. Then a framework is presented to compare different learning algorithms and methodologies. Using the results of this pre-study, a global architecture is proposed for a real time embedded recognition system. Instead of directly recognizing the affective states we propose to begin with detecting abrupt changes in the incoming signal to segment it first and label each segment afterwards. The interest of the proposed method is demonstrated on two real affective state recognition tasks.

Le but de ce travail est de montrer qu'il est possible de faire de la discrimination à l'aide de Séparateurs à Vaste Marge (SVM) sur des très grandes bases de données (des millions d'exemples, des centaines de caractéristiques et une dizaine de classes). Pout traiter cette masse de données, nous nous proposons d'utiliser un algorithme "en ligne" où les exemples sont présentés les uns après les autres. Cette approche permet à la fois une mise à jour rapide de la solution (qui ne dépend que d'un seul exemple à la fois) et la gestion efficace de la base d'apprentissage (qui n'a pas à être entièrement en mémoire). L'application visée est la reconnaissance de caractères avec prise en compte des invariances dans les données. Pour cela, nous adaptons l'algorithme LASVM (une méthode en ligne pour les SVM) en nous inspirant de [?] pour y intégrer la connaissancea priorisur l'invariance.

L'analyse automatique de productions d'apprenants est un problème central dans le cadre des EIAH et plus particulièrement des STI (Systèmes Tutoriels Intelligents). Nous proposons dans cet article un outil générique d'analyse assistée de productions d'apprenants. Cet outil doit permettre à un concepteur d'EIAH d'instancier facilement un module d'évaluation dans le cadre d'un EIAH particulier. L'outil instancié permettra alors une analyse automatique de la production de l'apprenant, tout en permettant au tuteur d'avoir une vision globale du niveau des apprenants participant à la formation. L'article présente en détail la démarche mise en œuvre ainsi que quelques exemples d'utilisation de l'outil.

To make applications able to be aware of their context opens a significant number of perpectives in the human-computer interaction. The large number of studies aiming at determining how to use the context shows the need for context retrieval. By analyzing the requirements in machine learning for each task related to context detection, we encountered a certain number of limits. Consequently, most of the work carried out during this thesis is related to machine learning from a general point of view. The goal is to achieve autonomous and enduring learning. By autonomous, we mean learning which does not require the intervention of a specialist. That implies to use methods that can auto-set and be used online. By enduring, we refer to a realistic use of the applications,i.e.real time, therefore fast, online and stable, for a very significant amount of data . Because the SVM give precise results, we focused our work on this method. However nSVM are far from fulfilling the requirements of autonomous and enduring learning. Autonomous learning is not only subjected to the need for efficiency of the solver. It is also limited by the presence of hyper-parameters. In the case of SVM, these hyper-parameters are relatively few. However, only one is enough to make a method dependent on a form of supervision which contradicts either the need for training on line, or the objective of independence with respect to a human intervention. We studied this problem via the regularization paths. The regularization paths make it possible to know all the solutions of a problem taking into consideration the biais-variance compromise. For the SVM, this compromise is regulated by one of the hyper-parameters and we thus use the regularization path to obtain an automatic adjustment of this hyper-parameter. We did not reach the stage of push-button SVM yet but we show that all the limits of the SVM are not insurmountable. For the possible size of the databases, we implemented the largest SVM to date on only one processor with 80 million points in dimension 784, by using the online method LASVM.

Often, in pattern recognition, complementary knowledge is available. This could be useful to improve the performance of the recognition system. Part of this knowledge regards invariances, in particular when treating images or voice data. Many approaches have been proposed to incorporate invariances in pattern recognition systems. Some of these approaches require a pre-processing phase, others integrate the invariances in the algorithms. We present a unifying formulation of the problem of incorporating invariances into a pattern recognition classifier and we extend the SimpleSVM algorithm to handle invariances efficiently.

We are interested in a low level part of user modeling, which is the change detection in the context of the user. We present a method to detect on line changes in the context of a user equipped with non invasive sensors. Our point is to provide, in real time, series of unlabeled contexts that can be classified and analyzed on a higher level.

This paper presents an algorithm for abrupt changes detection in signals, applied to physiological data. We are working on automatic context detection through wearable computers and clothes integrated sensors. Among the many problems that exists in this area, we are particularly interested in the automatic detection of changes in the state of the user, in order to develop somecontext awareapplications. For this purpose, the algorithm described here is based on one class support vector machines. We also illustrates our algorithm on two experiments in which data are collected through biological sensors and accelerometers.

Automatic context detection through wearable computers and sensors integrated in clothes is the question we address in this paper. Among the many problems that exists in this area, we are particularly interested in the automatic detection of changes in the state of the user, in order to developpe somecontext awareapplications. This paper presents a machine learning method for rupture detection, based on one class support vector machines. It also illustrates our algorithm on two experiments in which data are collected by biological sensors and accelerometers.

This paper presents an algorithm for abrupt changes detection in signals, applied to physiological data. We are working on automatic context detection through wearable computers and clothes integrated sensors. Among the many problems that exists in this area, we are particularly interested in the automatic detection of changes in the state of the user, in order to develop somecontext awareapplications. For this purpose, the algorithm described here is based on one class support vector machines. We also illustrates our algorithm on two experiments in which data are collected through biological sensors and accelerometers.

If SVM (Support Vector Machines) are now considered as one of the best learning methods, they are still considered as slow. Here we propose a Matlab toolbox that enables the usage of SVM in a fast and simple way. This is done thanks to the projected gradient method which is well adapted to the problem : SimpleSVM. We chose to implement this algorithm with Matlab environment since it is user-friendly and efficient - it uses the ATLAS (Automatically Tuned Linear Algebra Software) library. The comparison to the state of the art in this field, SMO (Sequential Minimal Optimization) shows that in some cases, our solution is faster and less complex. In order to point out how fast and simple our method is, we give here results on the MNITS database. It was possible to compute a satisfying solution in a quite short time (one hour and a half on a PC with Linux distribution to compute 45 binary classifiers, with 60000 samples in dimension 576). Moreover, we show how this algorithm can be extended to problems like invariances, breaking down into small pieces the problem such that it is possible to get the solution running only once the Invariant SimpleSVM.

Une problématique actuelle, induite par la complexité croissante des appareils technologiques, est la capacité d'un système à s'adapter à la situation de l'utilisateur, sans que celui-ci n'ait à s'en inquiéter. Dans cette optique, nous nous proposons de déterminer le comportement de déplacement d'une personne à partir de capteurs non intrusifs (accéléromètres). La méthode globale consiste à créer des caractéristiques susceptibles d'apporter de l'information à partir des signaux issus des capteurs, à garder les meilleures et à appliquer une méthode de discrimination. L'étape du processus sur laquelle nous avons insisté est le pro-blème de la sélection des variables pertinentes : comment parvenir à ne garder que les caractéristiques utiles ? Pour répondre à cette question, nous avons mis en concurrence notre approche (sélection de variables par approche globale (Grand-valet & Canu, 2002)) avec celles servant de références dans le domaine. Nos méthodes obtiennent jusqu'à 99% de bonne classification hors ligne. Ces résul-tats permettent d'envisager une extension en mode en ligne pour une application dans le domaine de la détection de contexte.