Gaëlle Loosli

Publication list


[Seck 2021 a]
Seck, I., and Loosli, G., and Canu, S. (2021) Linear Program Powered Attack. Article at IJCNN - International Joint Conference on Neural Network. [ bib ]
Finding the exact robust test error is a good way to compare the robustness of neural networks, but it is a difficult task even on small networks and datasets like MNIST. However, finding reasonable lower upper bounds is possible and can be done using either complete methods or attacks. On the one hand, complete methods such as Mixed Integer Program (MIP) give exact robust test accuracy but are time-consuming. On the other hand, attacks are usually fast but tend to perform badly against robust network and underestimate the lower bound on the robust test error. The purpose of this paper is to present a novel attack method that is both fast and gives better lower bounds than previous attacks. This method exploits the algebraic properties of networks with piecewise linear activation functions to partition the input space in such a way that for each subset of that partition, finding the local optimal adversarial example is done by solving a linear program. Moving from one subset to another is done using classical gradient-based attack tools. To evaluate the quality of the produced adversarial examples, we compare our lower bound on the robust test error to the one previously found. The results found are satisfying in the sense that it does better than previous lower bounds on several models and finds adversarial examples that the MIP failed to expose before reaching its time limit.


[Loosli 2019 b]
Loosli, G. (2019). Non convex combinations in Multiple Kernel Learning. Poster at Women in Machine Learning (WiML) workshop, co-located with NeurIPS 2018. [ bib ]
Multiple Kernel Learning is an elegant way of combining several kernels, but it is restricted to a convex linear combination. Taking advantages of theoretical and practical tools provided by works on learning in Krein spaces, we introduce here non convex combinations of kernels for MKL, solved in Krein spaces. The advantages are 1/ it can mix distances and similarities directly. 2/ it can handle indefinite kernels 3/ it can be solved at the cost of a couple of SVM using EasyMKL formulation.
[Loosli 2019]
Loosli, G. (2019). TrIK-SVM : an alternative decomposition for kernel methods in Krein spaces. Article at ESANN - European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning 2019. [ bib ]
The proposed work aims at proposing a alternative kernel decomposition in the context of kernel machines with indefinite kernels. The original paper of KSVM (SVM in \krein spaces) uses the eigen-decomposition, our proposition avoids this decompostion. We explain how it can help in designing an algorithm that won't require to compute the full kernel matrix. Finally we illustrate the good behavior of the proposed method compared to KSVM.
[Seck and Loosli and Canu, 2019]
Seck, I., and Loosli, G., and Canu, S. (2019). L1-norm double backpropagation adversarial defense. Article at ESANN - European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning 2019. [ bib ]
Adversarial examples are a challenging open problem for deep neural networks. We propose in this paper to add a penalization term that forces the decision function to be flat in some regions of the input space, such that it becomes, at least locally, less sensitive to attacks. Our proposition is theoretically motivated and shows on a first set of carefully conducted experiments that it behaves as expected when used alone, and seems promising when coupled with adversarial training.


[Loosli 2018]
Loosli, G. (2018). TrIK-SVM : more efficiency for learning in Krein spaces. Application to non-convex combinations in MKL. Poster at Women in Machine Learning (WiML) workshop, co-located with NeurIPS 2018. [ bib ]
We propose an improvement over the algorithm named KSVM, which solves SVM with indefinite kernels. The new algorithm is referred as TrIK-SVM. Its main advantage compared to KSVM is to produce sparse solutions, similarly to SVM with posititive definite kernels. From this new algorithm, we derive an application to the case of multiple kernel learning. In the MKL setting, all candidate kernels has to be positive definite and they are gathered using a convex linear combiantion, such that the final kernel is also positive definite. Here we propose to take advantage of TrIK-SVM to propose a new MKL approach withou any positivity constraints: any symmmetric kernel linearly combined with positive or negative coefficients.
[Seck and Loosli, 2018]
Seck, I., and Loosli, G. (2018). Generative adversarial nets and Cerema AWP dataset. Talk at ENBIS 18. [ bib ]
This talk will be about Generative Adversarial Networks(GANs), and a recently introduced dataset, the Cerema AWP(Adverse Weather Pedestrian). We want to assess the capacity of GANs to generate a particular element, in our case a pedestrian, at a specified place. The cerema AWP database is a good database for that task since for each image we have the bounding box of the pedestrian. The Cerema AWP dataset is an image dataset that was produced in a special installation, a tunnel in which different weather condition can be artificially created. And since that database was originally created for pedestrian detection, there is on each image a pedestrian. And the dataset is annotated according to the weather (10 different weathers), the pedestrian (5 different), their clothes (each pedestrian appears with two different clothes). Additional information such as the pedestrian’s direction or the bounding box of the pedestrian is available. The controlled environment, and those detailed information make this database attractive for our purpose. Indeed the background being fixed, it seems to be a simpler version of the problem we would get with different backgrounds, perspectives or other uncontrolled variations. In the cerema AWP database, most of the variation being controlled and associated with labels, we can study the generation, with all the conditions or according to a subset of weather or other conditions. In a previous study using a standard GAN, generated images presented a mixture of weather on a single output, showing that the generative network had trouble matching the dataset distribution. This problem was solved using a conditioning on the weather. Now the generated images have a uniform weather but a problem persists: we don’t have pedestrians on images. We are going to present the architectures, the ways of conditioning and others tricks to help the generator focus on the generation of pedestrians while generating realistic images.
[Seck et al., 2018a]
Seck, I., Dahmane, K., Duthon, P., and Loosli, G. (2018). Baselines and a datasheet for the cerema awp dataset. In Conférence d'Apprentissage CAp. [ bib | DOI | http | code ]
This paper presents the recently published Cerema AWP (Adverse Weather Pedestrian) dataset for various machine learning tasks and its exports in machine learning friendly format. We explain why this dataset can be interesting (mainly because it is a greatly controlled and fully annotated image dataset) and present baseline results for various tasks. Moreover, we decided to follow the very recent suggestions of datasheets for dataset, trying to standardize all the available information of the dataset, with a transparency objective.


[Aiolli et al., 2017]
Aiolli, F., Bonnet-Loosli, G., and Hérault, R. (2017). Advances in artificial neural networks, machine learning and computational intelligence. Neurocomputing, 268:1--3. Editorial. [ bib | DOI | http ]
This special issue of Neurocomputing presents 16 original articles that are extended versions of selected papers from the 24th European Symposium on Artificial Neural Networks (ESANN), a major event for researchers in the fields of artificial neural networks, machine learning, computational intelligence, and related topics. This single track conference occurs annually in Bruges, Belgium, a UNESCO World Heritage Site by one of the most beautiful medieval city centers in Europe. It is organised in collaboration by UCL (Université Catholique de Louvain—Louvain-la-Neuve) and KULeuven (Katholiek Universiteit—Leuven) and is steered by Prof. Michel Verleysen from UCL. In addition to regular sessions, the conference welcomes special sessions organised by renowned scientists in their respective fields. These sessions focus on particular topics, such as medical applications, physics, deep learning, indefinite proximity learning, information visualisation, incremental learning and advances in learning with kernels. The contributions in this special issue show that ESANN covers a broad range of topics in neural computing and neuroscience, from theoretical aspects to state-of-the-art applications. More than 120 researchers from 20 countries participated in the 24th ESANN in April 2016. Around 100 oral and poster communications have been presented this year. Based on the reviewers’ and special session organisers’ recommendations, as well as on the quality of the oral presentations at the conference, a number of authors were invited to submit an extended version of their conference paper for this special issue of Neurocomputing. All extended manuscripts were thoroughly reviewed once more by at least two independent experts and the 16 articles presented in this volume were accepted for publication.


[Loosli, 2016]
Loosli, G. (2016). Study on the loss of information caused by the "positivation" of graph kernels for 3d shapes. In 24th European Symposium on Artificial Neural Networks Bruges, Belgium, April 27-28-29. [ bib | pdf ]
In the presented experimental study, we compare the classification power of two variations of the same graph kernel. One variation is designed to produce semi-definite positive kernel matrices (Kmatching) and is an approximation of the other one, which is indefinite (Kmax). We show that using adaptated tools to deal with indefiniteness (KSVM), the original indefinite kernel outperforms its positive definite approximate version. We also propose a slight improvement of the KSVM method, which produces non sparse solutions, by adding a fast post-processing step that gives a sparser solution.
[Loosli et al., 2016]
Loosli, G., Canu, S., and Ong, C. S. (2016). Learning SVM in kreĬn spaces. IEEE Trans. Pattern Anal. Mach. Intell., 38(6):1204--1216. [ bib | DOI | http | code ]
This paper presents a theoretical foundation for an SVM solver in KreĬn spaces. Up to now, all methods are based either on the matrix correction, or on non-convex minimization, or on feature-space embedding. Here we justify and evaluate a solution that uses the original (indefinite) similarity measure, in the original KreĬn space. This solution is the result of a stabilization procedure. We establish the correspondence between the stabilization problem (which has to be solved) and a classical SVM based on minimization (which is easy to solve). We provide simple equations to go from one to the other (in both directions). This link between stabilization and minimization problems is the key to obtain a solution in the original KreĬn space. Using KSVM, one can solve SVM with usually troublesome kernels (large negative eigenvalues or large numbers of negative eigenvalues). We show experiments showing that our algorithm KSVM outperforms all previously proposed approaches to deal with indefinite matrices in SVM-like kernel methods.


[Bonnet and father, 2015]
Bonnet, G. and father, P. (2015). Eleanor Bonnet. [ bib | www ]
[Canu et al., 2015]
Canu, S., Fournier, J., Grandvalet, Y., Labbé, B., Loosli, G., Quost, B., and Ralaivola, L. (2015). Procédé d'élaboration d'un discriminateur multiclasse. FR Patent FR2,993,381. [ bib ]


[Aboubacar et al., 2014]
Aboubacar, H., Barra, V., and Loosli, G. (2014). 3d shape retrieval using uncertain semantic query - a preliminary study. In ICPRAM 2014 - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods, ESEO, Angers, Loire Valley, France, 6-8 March, 2014, pages 600--607. [ bib | DOI | http ]
The recent technological progress contributes to a huge increase of 3D models available in digital forms. Numerous applications were developed to deal with this amount of information, especially for 3D shape retrieval. One of the main issues is to break the semantic gap between shapes desired by users and shapes returned by retrieval methods. In this paper, we propose an algorithm to address this issue. First the user gives a semantic request. Second, a fuzzy 3D-shape generator sketches out suitable 3D-shapes. Those shapes are filtered by the user or a learning machine to select the one that match the semantic query. Then, we use a state-of-the-art retrieval method to return real-world 3D shapes that match this semantic query. We present results from an experiment. Three semantic concepts are learned and 3D shapes from SHREC'07 database that match each concept are retrieved using our algorithm. The result are good and promising.
[Loosli and Aboubacar, 2014]
Loosli, G. and Aboubacar, H. (2014). Using svdd in simplemkl for 3d-shapes filtering. In CAp - 16eme Conférence d'apprentissage, pages 84--92. [ bib |  http | code | data ]
This paper proposes the adaptation of Support Vector Data Description (SVDD) to the multiple kernel case (MK-SVDD), based on SimpleMKL. It also introduces a variant called Slim-MK-SVDD that is able to produce a tighter frontier around the data. For the sake of comparison, the equivalent methods are also developed for One-Class SVM, known to be very similar to SVDD for certain shapes of kernels. Those algorithms are illustrated in the context of 3D-shapes filtering and outliers detection. For the 3D-shapes problem, the objective is to be able to select a sub-category of 3D-shapes, each sub-category being learned with our algorithm in order to create a filter. For outliers detection, we apply the proposed algorithms for unsupervised outliers detection as well as for the supervised case.
[Arouri et al., 2014]
Arouri, C., Nguifo, E. M., Aridhi, S., Roucelle, C., Bonnet-Loosli, G., and Tsopzé, N. (2014). Towards a constructive multilayer perceptron for regression task using non-parametric clustering. a case study of photo-z redshift reconstruction. CoRR, abs/1412.5513. [ bib | arXiv | http ]


[Bouges et al., 2013]
Bouges, P., Chateau, T., Blanc, C., and Loosli, G. (2013). Handling missing weak classifiers in boosted cascade: application to multiview and occluded face detection. EURASIP J. Image and Video Processing, 2013:55. [ bib | DOI | http ]
We propose a generic framework to handle missing weak classifiers at testing stage in a boosted cascade. The main contribution is a probabilistic formulation of the cascade structure that considers the uncertainty introduced by missing weak classifiers. This new formulation involves two problems: (1) the approximation of posterior probabilities on each level and (2) the computation of thresholds on these probabilities to make a decision. Both problems are studied, and several solutions are proposed and evaluated. The method is then applied to two popular computer vision applications: detecting occluded faces and detecting faces in a pose different than the one learned. Experimental results are provided using conventional databases to evaluate the proposed strategies related to basic ones.


[Bouges et al., 2012a]
Bouges, P., Chateau, T., Blanc, C., and Loosli, G. (2012a). Improving existing cascaded face classifier by adding occlusion handling. In The 21st IEEE International Symposium on Robot and Human Interactive Communication, IEEE RO-MAN 2012, Paris, France, September 9-13, 2012, pages 120--125. [ bib | DOI | http ]
[Bouges et al., 2012b]
Bouges, P., Chateau, T., Blanc, C., and Loosli, G. (2012b). Using k-nearest neighbors to handle missing weak classifiers in a boosted cascade. In Proceedings of the 21st International Conference on Pattern Recognition, ICPR 2012, Tsukuba, Japan, November 11-15, 2012, pages 1763--1766. [ bib | http ]


[Gandar et al., 2011]
Gandar, B., Loosli, G., and Deffuant, G. (2011). Dispersion effect on generalisation error in classification - experimental proof and practical algorithm. In ICAART 2011 - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence, Volume 1 - Artificial Intelligence, Rome, Italy, January 28-30, 2011, pages 703--706. [ bib |  http ]
We consider the problem of generating learning data within the context of active learning in classification. First, we recall theoretical results proposing discrepancy as a criterion for generating sample in regression. We show that theoretical results about low discrepancy sequences in regression problems are not adequate for classification problems. Secondly we give a theoretical argument in favour of dispersion as a criterion for generating data. Then, we present numerical experiments which have a good degree of adequacy with this theory.
[Bouges et al., 2011]
Bouges, P., Chateau, T., Blanc, C., and Loosli, G. (2011). Face detection in a pose different than the one learned. [ bib ]
[Loosli and Canu, 2011]
Loosli, G. and Canu, S. (2011). Non positive svm. In OPT NIPS Workshop. [ bib ]
Learning SVM with non positive kernels is is a problem that has been addressed in the last years but it is not really solved : indeed, either the kernel is corrected (as a pre-treatment or via a modified learning scheme), either it is used with some well-chosen parameters that lead to almost positive-definite kernels. In this work, we aim at solving the actual problem induced by non positive kernels, i.e. solving the stabilization system in the space associated with the non-positive kernel. We first describe this stabilization system, then we expose a simple algorithm based on the eigen-decomposition of the kernel matrix. While providing satisfying solutions, the proposed algorithm shows limitations in terms of memory storage and computational effort. The direct resolution is still an open question.
[Canu et al., 2011]
Canu, S., Loosli, G., and Rakotomamonjy, A. (2011). SVM and kernel machines. Lecture. [ bib | http | .pdf ]


[Gandar et al., 2010b]
Gandar, B., Loosli, G., and Deffuant, G. (2010b). How to generate data for approximating multidimensional surfaces? application to the approximation of viability kernels. In Workshop on Active Learning and Experimental Design, AISTATS Conference. [ bib ]
[Gandar et al., 2010a]
Gandar, B., Loosli, G., and Deffuant, G. (2010a). Démonstration : Comment répartir des points pour apprendre sans a priori. In Conférence francophone sur l'apprentissage automatique, Clermont-Ferrand (France). [ bib ]
Nous nous intéressons à la manière de générer un nombre fini de points d’apprentissage dans le cadre d’un apprentissage actif. Ce problème se rencontre dans divers contextes tels que l’optimisation de méta-modèles en ingénierie, l’approximation de fonctions, ou la modélisation d’interactions entre variables sans connaissances a priori. Il est bien connu que plus les données sont nombreuses, plus la modélisation est de bonne qualité. Cependant obtenir des données peut être coûteux ou destructif notamment en chimie, géologie, biologie ou simulations informatiques. Supposons que l’expérimentateur n’ait la possibilité d’obtenir qu’uniquement n points d’apprentissage. Comment doit-il générer son plan d’expérience sans a priori ? La solution consiste à répartir les points d’une manière uniforme sur tout l’espace d’étude. Mais comment générer des points uniformes ? Cette question qui paraît anodine au premier regard et en fait assez complexe. Une solution naturelle consiste à générer une suite de points de manière aléatoire. Une autre solu tion consiste à utiliser une grille régulière. Mais le problème demeure entier lorsque le nombre de points à notre disposition n’est pas adéquat pour une telle répartition .


[Bonnet and father, 2009]
Bonnet, G. and father, P. (2009). Marion Bonnet. [ bib | www ]
[Gandar et al., 2009b]
Gandar, B., Loosli, G., and Deffuant, G. (2009b). Les suites à dispersion faible comme bases d’exemples optimales pour l’apprentissage actif des formes. pages 347--350. [ bib | http ]
Nous souhaitons générer des bases d'exemples adaptées aux problèmes d'approximation de formes (classification), sans avoir de connaissance a priori sur la forme à apprendre. Nous montrons tout d'abord que les résultats théoriques privilégiant les suites à discrépance faible pour les problèmes de régression sont inadaptés aux problèmes de classification. Nous donnons des arguments théoriques et des résultats de simulation montrant que c'est la dispersion des points d'apprentissage qui est le critère pertinent à minimiser pour optimiser les performances de l'apprentissage en classification.
[Gandar et al., 2009a]
Gandar, B., Loosli, G., and Deffuant, G. (2009a). How to optimize sample in active learning: Dispersion, an optimum criterion for classification? [ bib ]
[Loosli and Canu, 2009b]
Loosli, G. and Canu, S. (2009b). Optimization in Signal and Image Processing, chapter Quadratic Programming and Machine Learning—Large Scale Problems and Sparsity, pages 111--135. ISTE Wiley. [ bib ]
For a child, learning is a complex process that consists of acquiring or developing certain competences on the basis of multiple experiences. For a machine, this learning process can be reduced to examples or observations that are used to improve performance. Machine learning can be seen as the optimization of criteria defined on examples. The higher the number of examples, the better the learning process. In terms of optimization, this learning process includes several specific problems. How are the criteria to be optimized defined? How to manage large amounts of data? Which algorithm is efficient in this context? When dealing with those problems, neural networks approaches suggest the usage of non-convex criteria associated to gradient descent methods. This procedure leads to several difficulties linked to the non-convex criteria. The key to the success of kernel based methods (obtained about a decade after the introduction of neural networks) is their capacity to express the learning problem as a large scale quadratic programming problem (convex). Kernel based methods often lead to sparse solutions, i.e. a large number of their components equal zero. Based on this particularity, learning algorithms can solve large scale problems in reasonable time. Solving this type of problem currently takes about one day of calculating when using a mono-processor for 8 million unknowns. Among these 8 million unknowns only 8,000 to 20,000 variables do not equal zero depending on the complexity of the problem.
[Loosli and Canu, 2009a]
Loosli, G. and Canu, S. (2009a). Optimisation en traitement du signal et de l'image, chapter Programmation quadratique et apprentissage. Grande taille et parcimonie, pages 111--135. Hermes Science. [ bib | http ]
Apprendre, pour un enfant, est un processus complexe qui vise à acquérir ou à développer certaines facultés à partir d'expériences diverses. Pour une machine, cela se réduit à utiliser des exemples ou des observations pour améliorer ses performances. Ainsi, l'apprentissage automatique peut se concevoir comme l'optimisation, dans un ensemble suffisamment grand, d'un critère défini à partir d'exemples ; plus il y aura d'exemples, meilleur devra être l'apprentissage. Cette manière de voir pose, du point de vue de l'optimisation, des problèmes spécifiques : comment définir le critère à optimiser, comment gérer la masse des données, comment trouver un algorithme efficace dans ce cadre ? Pour traiter ces problèmes, les méthodes d'apprentissage de type réseaux de neurones proposent d'utiliser des critères non convexes associés à des méthodes de descente de gradient. Ce procédé entraîne de nombreuses difficultés pratiques liées à la non convexité du critère à optimiser. L'une des clés du succès des méthodes à noyaux, acquis une dizaine d'années après l'introduction des réseaux de neurones, est leur capacité à formuler le problème d'apprentissage comme un problème de programmation quadratique (convexe) de grande taille et très particulier. En effet, avec les méthodes à noyaux, la solution recherchée est souvent << parcimonieuse >> : un très grand nombre de ses composants sont nuls. Profitant aujourd'hui de cette spécificité, les algorithmes d'apprentissage peuvent résoudre en un temps raisonnable des problèmes de programmation quadratique de très grande taille : 1 journée de calcul en moyenne avec une machine mono-processeur pour 8 millions d'inconnues. Sur ces 8 millions d'inconnues, seules de l'ordre de 8000 à 20000 sont non nulles en fonction de la complexité du problème.
[Gandar et al., 2009c]
Gandar, B., Loosli, G., and Deffuant, G. (2009c). Sélection de points en apprentissage actif. discrépance et dispersion: des critères optimaux? In MajecSTIC 2009, pages 8--p. [ bib ]


[Gandar et al., 2008]
Gandar, B., Deffuant, G., and Loosli, G. (2008). Les suites à discrépance faible: un moyen de réduire le nombre de vecteurs supports des svms. [ bib ]
[Loosli et al., 2008]
Loosli, G., Deffuant, G., and Canu, S. (2008). Balk: Bandwidth autosetting for svm with local kernels application to data on incomplete grids. In Conférence francophone sur l'apprentissage automatique, Ile de Porquerolles, France. [ bib ]
This paper focuses on the use of Support Vector Machines (SVM) when learning data located on incomplete grids. We identify here two typical behaviours to be avoided, that we call holes. Holes are regions of the space with no training data where the decision changes. We propose a novel algorithm which aims at preventing holes to appear. It automatically selects the local kernel bandwidth during training. We provide hard-margin and soft-margin versions and several experimental results. Even though our method is designed for a specific application, it turns out that it can be applied to more general problems.


[Bonnet and father, 2007]
Bonnet, G. and father, P. (2007). Martin Bonnet. [ bib | www ]
[Loosli and Canu, 2007]
Loosli, G. and Canu, S. (2007). Comments on the "core vector machines: Fast SVM training on very large data sets". Journal of Machine Learning Research, 8:291--301. [ bib | .html ]
In a recently published paper in JMLR, coresvm, present an algorithm for SVM called Core Vector Machines (CVM) and illustrate its performances through comparisons with other SVM solvers. After reading the CVM paper we were surprised by some of the reported results. In order to clarify the matter, we decided to reproduce some of the experiments. It turns out that to some extent, our results contradict those reported. Reasons of these different behaviors are given through the analysis of the stopping criterion.
[Loosli et al., 2007a]
Loosli, G., Canu, S., and Bottou, L. (2007a). Large scale kernel machines, chapter Training invariant support vector machines using selective sampling, pages 301--320. MIT. [ bib ]
In this chapter we present the combination of two approaches to build a very large SVM. The first method from [?] proposes a strategy for handling invariances in SVMs. It is based on the well-known idea that small deformation of examples should not change their class. The deformed samples are selected or discarded during learning (it is selective sampling). The second approach is LASVM. It is an efficient online algorithm (each training point is only seen once) that also uses selective sampling. We present state-of-the-art results obtained on a handwritten digit recognition problem with 8 millions points on a single processor. This work also demonstrates that online SVMs can effectively handle really large databases.
[Loosli et al., 2007b]
Loosli, G., Gasso, G., and Canu, S. (2007b). Regularization paths for ν-svm and ν-svr. In Liu, D., Fei, S., Hou, Z., Zhang, H., and Sun, C., editors, Advances in Neural Networks -- ISNN 2007: 4th International Symposium on Neural Networks, ISNN 2007, Nanjing, China, June 3-7, 2007, Proceedings, Part III, pages 486--496, Berlin, Heidelberg. Springer Berlin Heidelberg. [ bib | DOI | http ]
This paper presents the ν-SVM and the ν-SVR full regularization paths along with a leave-one-out inspired stopping criterion and an efficient implementation. In the ν-SVR method, two parameters are provided by the user: the regularization parameter C and ν which settles the width of the ε-tube. In the classical ν-SVM method, parameter ν is an lower bound on the number of support vectors in the solution. Based on the previous works, extensions of regularization paths for SVM and SVR are proposed and permit to automatically compute the solution path by varying ν or the regularization parameter.


[Loosli et al., 2006b]
Loosli, G., Lee, S., and Rakotomamonjy, A. (2006b). Perception d'états affectifs et apprentissage. Revue d'Intelligence Artificielle, 20(4-5):553--582. [ bib | DOI | http ]
This article deals with the problem of affective states recognition from physical and physiological wearable sensors. Given the complex nature of the relationship between available signals and affective states to be detected we propose to use a statistical learning method. We begin with a discussion about the state of the art in the field of statistical learning algorithms and their application to affective states recognition. Then a framework is presented to compare different learning algorithms and methodologies. Using the results of this pre-study, a global architecture is proposed for a real time embedded recognition system. Instead of directly recognizing the affective states we propose to begin with detecting abrupt changes in the incoming signal to segment it first and label each segment afterwards. The interest of the proposed method is demonstrated on two real affective state recognition tasks.
[Loosli et al., 2006a]
Loosli, G., Canu, S., and Bottou, L. (2006a). Svm et apprentissage des très grandes bases de données. [ bib ]
Le but de ce travail est de montrer qu'il est possible de faire de la discrimination à l'aide de Séparateurs à Vaste Marge (SVM) sur des très grandes bases de données (des millions d'exemples, des centaines de caractéristiques et une dizaine de classes). Pout traiter cette masse de données, nous nous proposons d'utiliser un algorithme "en ligne" où les exemples sont présentés les uns après les autres. Cette approche permet à la fois une mise à jour rapide de la solution (qui ne dépend que d'un seul exemple à la fois) et la gestion efficace de la base d'apprentissage (qui n'a pas à être entièrement en mémoire). L'application visée est la reconnaissance de caractères avec prise en compte des invariances dans les données. Pour cela, nous adaptons l'algorithme LASVM (une méthode en ligne pour les SVM) en nous inspirant de [?] pour y intégrer la connaissance a priori sur l'invariance.
[Delorme and Loosli, 2006]
Delorme, F. and Loosli, G. (2006). Un outil générique pour l’analyse automatique et la visualisation de productions d’apprenants. [ bib ]
L'analyse automatique de productions d'apprenants est un problème central dans le cadre des EIAH et plus particulièrement des STI (Systèmes Tutoriels Intelligents). Nous proposons dans cet article un outil générique d'analyse assistée de productions d'apprenants. Cet outil doit permettre à un concepteur d'EIAH d'instancier facilement un module d'évaluation dans le cadre d'un EIAH particulier. L'outil instancié permettra alors une analyse automatique de la production de l'apprenant, tout en permettant au tuteur d'avoir une vision globale du niveau des apprenants participant à la formation. L'article présente en détail la démarche mise en œuvre ainsi que quelques exemples d'utilisation de l'outil.
[Loosli, 2006]
Loosli, G. (2006). Méthodes à noyaux pour la détection de contexte. PhD thesis, INSA de Rouen. [ bib ]
To make applications able to be aware of their context opens a significant number of perpectives in the human-computer interaction. The large number of studies aiming at determining how to use the context shows the need for context retrieval. By analyzing the requirements in machine learning for each task related to context detection, we encountered a certain number of limits. Consequently, most of the work carried out during this thesis is related to machine learning from a general point of view. The goal is to achieve autonomous and enduring learning. By autonomous, we mean learning which does not require the intervention of a specialist. That implies to use methods that can auto-set and be used online. By enduring, we refer to a realistic use of the applications,i.e. real time, therefore fast, online and stable, for a very significant amount of data . Because the SVM give precise results, we focused our work on this method. However nSVM are far from fulfilling the requirements of autonomous and enduring learning. Autonomous learning is not only subjected to the need for efficiency of the solver. It is also limited by the presence of hyper-parameters. In the case of SVM, these hyper-parameters are relatively few. However, only one is enough to make a method dependent on a form of supervision which contradicts either the need for training on line, or the objective of independence with respect to a human intervention. We studied this problem via the regularization paths. The regularization paths make it possible to know all the solutions of a problem taking into consideration the biais-variance compromise. For the SVM, this compromise is regulated by one of the hyper-parameters and we thus use the regularization path to obtain an automatic adjustment of this hyper-parameter. We did not reach the stage of push-button SVM yet but we show that all the limits of the SVM are not insurmountable. For the possible size of the databases, we implemented the largest SVM to date on only one processor with 80 million points in dimension 784, by using the online method LASVM.
[Loosli and Canu, 2006]
Loosli, G. and Canu, S. (2006). Auto-setting the svm hyper-parameters using regularization paths. [ bib ]


[Loosli et al., 2005a]
Loosli, G., Canu, S., Vishwanathan, S., and Smola, A. (2005a). Invariances in classification: an efficient svm implementation. In Applied Stochastic Models and Data Analysis, Brest, France. [ bib ]
Often, in pattern recognition, complementary knowledge is available. This could be useful to improve the performance of the recognition system. Part of this knowledge regards invariances, in particular when treating images or voice data. Many approaches have been proposed to incorporate invariances in pattern recognition systems. Some of these approaches require a pre-processing phase, others integrate the invariances in the algorithms. We present a unifying formulation of the problem of incorporating invariances into a pattern recognition classifier and we extend the SimpleSVM algorithm to handle invariances efficiently.
[Loosli et al., 2005e]
Loosli, G., Lee, S.-G., and Canu, S. (2005e). Context changes detection by one-class svms. In Proceedings Workshop on Machine Learning for User Modeling: Challenges. [ bib ]
We are interested in a low level part of user modeling, which is the change detection in the context of the user. We present a method to detect on line changes in the context of a user equipped with non invasive sensors. Our point is to provide, in real time, series of unlabeled contexts that can be classified and analyzed on a higher level.
[Loosli et al., 2005f]
Loosli, G., Lee, S.-G., and Canu, S. (2005f). Context retrieval by rupture detection. [ bib ]
This paper presents an algorithm for abrupt changes detection in signals, applied to physiological data. We are working on automatic context detection through wearable computers and clothes integrated sensors. Among the many problems that exists in this area, we are particularly interested in the automatic detection of changes in the state of the user, in order to develop some context aware applications. For this purpose, the algorithm described here is based on one class support vector machines. We also illustrates our algorithm on two experiments in which data are collected through biological sensors and accelerometers.
[Loosli et al., 2005d]
Loosli, G., Lee, S., and Canu, S. (2005d). Rupture detection for context aware applications. In Proceedings of the First Internaltional Workshop on Personalized Context Modeling and Management for UbiComp Applications, ubiPCMM 2005, Tokyo, Japan, September 11, 2005.bib | .pdf ]
Automatic context detection through wearable computers and sensors integrated in clothes is the question we address in this paper. Among the many problems that exists in this area, we are particularly interested in the automatic detection of changes in the state of the user, in order to developpe some context aware applications. This paper presents a machine learning method for rupture detection, based on one class support vector machines. It also illustrates our algorithm on two experiments in which data are collected by biological sensors and accelerometers.
[Loosli et al., 2005c]
Loosli, G., Lee, S., and Canu, S. (2005c). Context retrieval by rupture detection. In Actes de CAP 05, Conférence francophone sur l'apprentissage automatique - 2005, Nice, France, du 31 mai au 3 juin 2005, pages 111--112. [ bib ]
This paper presents an algorithm for abrupt changes detection in signals, applied to physiological data. We are working on automatic context detection through wearable computers and clothes integrated sensors. Among the many problems that exists in this area, we are particularly interested in the automatic detection of changes in the state of the user, in order to develop some context aware applications. For this purpose, the algorithm described here is based on one class support vector machines. We also illustrates our algorithm on two experiments in which data are collected through biological sensors and accelerometers.
[Loosli, 2005]
Loosli, G. (2005). Lasvm applied to invariant problems. NIPS workshop on Large Scale Kernel Machines [ bib |  http |  pdf ]
[Loosli et al., 2005b]
Loosli, G., Canu, S., Vishwanathan, S. V. N., Smola, A. J., and Chattopadhyay, M. (2005b). Boïte à outils SVM simple et rapide. Revue d'Intelligence Artificielle, 19(4-5):741--767. [ bib | DOI | http | code v2.31 | code v4 ]
If SVM (Support Vector Machines) are now considered as one of the best learning methods, they are still considered as slow. Here we propose a Matlab toolbox that enables the usage of SVM in a fast and simple way. This is done thanks to the projected gradient method which is well adapted to the problem : SimpleSVM. We chose to implement this algorithm with Matlab environment since it is user-friendly and efficient - it uses the ATLAS (Automatically Tuned Linear Algebra Software) library. The comparison to the state of the art in this field, SMO (Sequential Minimal Optimization) shows that in some cases, our solution is faster and less complex. In order to point out how fast and simple our method is, we give here results on the MNITS database. It was possible to compute a satisfying solution in a quite short time (one hour and a half on a PC with Linux distribution to compute 45 binary classifiers, with 60000 samples in dimension 576). Moreover, we show how this algorithm can be extended to problems like invariances, breaking down into small pieces the problem such that it is possible to get the solution running only once the Invariant SimpleSVM.


[Loosli et al., 2004]
Loosli, Gaëlle and Canu, Stéphane and Vishwanathan, SVN and Smola, Alexander J and Chattopadhyay, Manojit (2004). Une boîte à outils rapide et simple pour les SVM, CAp 2004. [ bib |  http ]
Si les SVM (Support Vector Machines, ou Séparateurs à Vaste Marge) sont aujourd’hui reconnus comme l’une des meilleures méthodes d’apprentissage, ils restent considérés comme lents. Nous proposons ici une boîte à outils Matlab permettant d’utiliser simplement et rapidement les SVM grâce à une méthode de gradient projeté particulièrement bien adaptée au problème : SimpleSVM (Vishwanathanet al., 2003). Nous avons choisi de coder cet algorithme dans l’environnement Matlab afin de profiter de sa convivialité tout en s’assurant unebonne efficacité. La comparaison de notre solution avec l’état de l’art dans le domaine SMO (Sequential Minimal Optimization), montre qu’il s’agit là d’une solution dans certains cas plus rapide et d’une complexité moindre. Pour illustrerla simplicité et la rapidité de notre méthode, nous montrons enfin que sur la basede données MNIST, il a été possible d’obtenir des résultats satisfaisants en un temps relativement court (une heure et demi de calcul sur un PC sous linux pour construire 45 classifieurs binaires sur 60.000 exemples en dimension 576).


[Loosli et al., 2003]
Loosli, G., Canu, S., and Rakotomamonjy, A. (2003). Détection des activités quotidiennes à l'aide des séparateurs à vaste marge.RJCIA, France, pages 139--152. [ bib |  http ]
Une problématique actuelle, induite par la complexité croissante des appareils technologiques, est la capacité d'un système à s'adapter à la situation de l'utilisateur, sans que celui-ci n'ait à s'en inquiéter. Dans cette optique, nous nous proposons de déterminer le comportement de déplacement d'une personne à partir de capteurs non intrusifs (accéléromètres). La méthode globale consiste à créer des caractéristiques susceptibles d'apporter de l'information à partir des signaux issus des capteurs, à garder les meilleures et à appliquer une méthode de discrimination. L'étape du processus sur laquelle nous avons insisté est le pro-blème de la sélection des variables pertinentes : comment parvenir à ne garder que les caractéristiques utiles ? Pour répondre à cette question, nous avons mis en concurrence notre approche (sélection de variables par approche globale (Grand-valet & Canu, 2002)) avec celles servant de références dans le domaine. Nos méthodes obtiennent jusqu'à 99% de bonne classification hors ligne. Ces résul-tats permettent d'envisager une extension en mode en ligne pour une application dans le domaine de la détection de contexte.
[Thirion et al., 2003]
Thirion, B., Loosli, G., Douyère, M., and Darmoni, S. J. (2003). Metadata element set in quality-controlled subject gateway: a step to a health semantic web. In The New Navigators: from Professionals to Patients - Proceedings of MIE2003, Saint Malo, France., pages 707--712. [ bib | DOI | http ]