Petasky is a subproject of MASTODONS CNRS’s challenge.

Here is the introduction of the project:

In many scientific fields such as physics, astronomy, biology or environmental science, the rapid development of scientific instruments and intensive use of computer simulation have led in recent years to the production of a large amount of data. Modern scientific applications are then confronted with new problems that are primarily related to the storage and use of these data. In addition to the growing volume of data to handle their complex nature (images, uncertain data, multiscale …), the heterogeneity of their formats and the various treatments to which they are subject are the main sources of difficulties. The problems are such that scientific data management is now recognized as a real bottleneck that slows scientific research, the latter relying more and more on the analysis of massive data . In this context, the role of IT as a direct means by which to improve the process of discovery in science is paramount. This has led scientists from different disciplines to work together for reflection to the emergence of new tools, approaches and techniques of management and operation of these gigantic masses of data. This is the case for example XLDB two conferences (Extremely Large Data Bases, and SciDB (Scientific Data Bases,

The work presented in this project addresses the problem of management of scientific data in the field of cosmology. Therefore, it fits in the context of e-Science. It brings together researchers and engineers from research laboratories and computer scientists from laboratories IN2P3 / CNRS. Data will be produced within two scientific projects, the LSST (Large Synoptic Survey Telescope) and Euclid. These data exhibit a complexity regarding their handling, never equaled before.

I’m implied in the Data-Mining and Machine Learning team in the Petasky project and currently work on the redshift evaluation.