Alex Andoni investigates the algorithmic foundations of massive data. In particular, he is interested in discovering the fundamental design principles of algorithms for efficient processing of large datasets.
A prototypical problem is that of similarity search: given a set of objects such as images, find all pairs of similar objects quickly (faster than considering all possible pairs of objects, which is unaffordable for the sizes of modern datasets). This problem has numerous applications spanning many areas, including information retrieval, machine learning, data mining, computer vision, signal processing, and bioinformatics, and it underlies the classic "nearest neighbor" classification rule in machine learning.
Andoni develops new algorithmic design principles by leveraging advanced mathematical primitives for representing and manipulating data efficiently. For example, the above problem admits particularly fast algorithms if one can represent the objects using small “sketches” preserving relevant structure (the “similarity” in this case). The challenge then becomes to find the best sketch primitive. The general underlying theme is also to study models and algorithms with very restricted resources, such as space, time, measurements, samples, or communication. These questions are core questions in areas such as sublinear algorithms (namely, streaming algorithms, property testing, compressive sensing), high-dimensional geometry, metric embeddings, and others. Andoni has used and developed tools in these areas, collaborating with experts in theoretical computer science, mathematics, machine learning, and statistics, and ultimately providing novel approaches for many computational problems on massive datasets.