Scalable Machine Learning on Data Sequences:
The Journey from Offline to Real-Time
By Themis Palpanas
There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to analyze very large collections of static and streaming sequences (a.k.a. data series), often times in real-time. Examples of such applications come from machine operation monitoring, astrophysics, and a multitude of other scientific and application domains that need to apply machine learning techniques for knowledge extraction. It is not unusual for these applications to involve numbers of data series in the order of hundreds of millions to billions, which are often times not analyzed in their full detail due to their sheer size. However, no existing data management solution (such as relational databases, column stores, array databases, and time series management systems) can offer native support for sequences and the corresponding operators necessary for complex analytics. In this talk, we argue for the need to study the theory and foundations for sequence management of big data sequences, and to build corresponding systems that will enable scalable management and analytics of very large sequence collections. We describe recent efforts in designing techniques for indexing and analyzing truly massive collections of data series that will enable scientists to run complex analytics on their data in an interactive fashion. Finally, we present our vision for the future in big sequence management research, including the promising directions in terms of storage, distributed processing, and query benchmarks.
Themis Palpanas is Senior Member of the French University Institute (IUF), a distinction that recognizes excellence across all academic disciplines, and professor of computer science at the University of Paris (France), where he is director of diNo, the data management group. He received the BS degree from the National Technical University of Athens, Greece, and the MSc and PhD degrees from the University of Toronto, Canada. He has previously held positions at the University of California at Riverside, University of Trento, and at IBM T.J. Watson Research Center, and visited Microsoft Research, and the IBM Almaden Research Center. His interests include problems related to data science (big data analytics and machine learning applications). He is the author of nine US patents, three of which have been implemented in world-leading commercial data management products. He is the recipient of three Best Paper awards, and the IBM Shared University Research (SUR) Award. He is currently serving on the VLDB Endowment Board of Trustees, as an Editor in Chief for the BDR Journal, Associate Editor in the TKDE, and IDA journals, as well as on the Editorial Advisory Board of the IS journal, and the Editorial Board of the TLDKS Journal. He has served as General Chair for VLDB 2013, Associate Editor for VLDB 2019 and 2017, Research PC Vice Chair for ICDE 2020, and Workshop Chair for EDBT 2016, ADBIS 2013, and ADBIS 2014, General Chair for the [email protected] International Workshop (in conjunction with VLDB 2014), and General Chair for the Event Processing Symposium 2009.
Opening the Black Box: Explorations and Explanations by means of Rules
By Elena Baralis
Machine learning algorithms dig deep into large amounts of data to extract pattern that highlight new insights on data and derive models to predict new variables. Data exploration and analysis is gaining more and more momentum. However, many powerful techniques produce hardly interpretable results.
Rules are a powerful tool to describe knowledge, both when exploring data and building models. In this talk I will cover diverse types of rules, aiming at providing abstract representations of data by means of patterns, building interpretable models, and explaining black box model predictions.
Elena Baralis is full professor at Politecnico di Torino, Italy. She is the head of the Computer Engineering Department since November 2019. She holds a laurea degree in Electrical Engineering and a Ph.D. in Computer and Systems Engineering, both from Politecnico di Torino. She lectures on database systems, database systems technology, data warehousing and data mining.
Her current research interests are in the overlapping fields of database systems and machine learning, more specifically on mining algorithms for big databases and sensor data analysis. Her research activity focuses on the study of algorithms for diverse data mining tasks on big data, including association rule mining to discover correlation among data at different abstraction levels, extraction of knowledge for both performing and explaining predictions, and extraction of summaries from textual data (summarization task). She has published over 150 papers in international journals and conference proceedings.