Professor Sitan Chen’s love of problem solving has guided him through his interdisciplinary journey from a math undergraduate at Harvard, through a PhD in Electrical Engineering and Computer Science at MIT and a postdoc at UC Berkeley, to his current role as an Assistant Professor of Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences. As a theoretical computer scientist, Sitan’s research encompasses both classical and quantum subject matter.
By Albert Zhu and Haley Nguyen
June 10, 2024
Professor Sitan Chen.
Sitan is generally interested in distribution learning. Broadly speaking, distributions are the shape the data take. For example, you could consider rolling two dice and adding their values. If you do this a bunch of times and plot a histogram, you’ll see a probability distribution like this.
Example of the distribution you get when you add the sum of two dice.
Many distributions have a mathematical description that tells you a lot of statistics about your data. We often assume or predict that certain data should follow certain distributions. For example, a college professor might expect that scores on an exam follow a Gaussian distribution.
Oftentimes, however, we don’t know the underlying distribution that is generating our data, and that is where the field of distribution learning comes into play. Broadly, distribution learning involves drawing samples from an unknown distribution and then using an algorithm to infer what the distribution is based on the samples you have drawn. The goal is to be able to determine this distribution accurately with a small number of samples relative to the “complexity” of the distribution.
Sitan’s research includes both classical and quantum problems. Distributions in quantum data may just look a bit different than those in classical data, due to the underlying quantum mechanics. Sitan is interested in what kinds of claims we can make about the performance of real-life quantum computers given their limitations. While physicists and engineering faculty might think of this subject in terms of physical hardware or the underlying physics, Sitan’s approach as a theoretical computer scientist is to think about what these quantum devices can do in terms of statistical or mathematical arguments.
As our HQI community grows larger, we’d like to offer insight into the individuals who make our innovative research possible. In the following interview, Prof. Chen takes us through his career journey. Our conversation has been edited for ease of reading.
SC: I went through a lot of shifts. At an early age, I knew I wanted to do something related to math. I went into college thinking I wanted to pursue a career in academia and in pure math. Then in the middle of college, I heard about theoretical computer science for the first time. I remember very clearly taking Prof. Salil Vadhan's graduate complexity theory class and reading his monograph on pseudorandomness, which got me really hooked on this area.
I went into grad school wanting to do research related to complexity theory, but that also changed very quickly. Through working with my advisor, Prof. Ankur Moitra, who specializes in algorithmic aspects of machine learning, I began to transition from concepts that I was more familiar with in undergrad, which were primarily of an algebraic nature, to tools involving, for instance, high dimensional probability. I learned that there are lots of these kinds of mathematical tools and theoretical computer science questions squarely in the area of theoretical machine learning. And that's how I converged on where I am now.
SC: It goes without saying that my advisor Prof. Ankur Moitra was an important mentor and had a huge influence on research tastes. Another one of my mentors is my postdoc host, Prasad Raghavendra, a professor in computer science at Berkeley. Before my postdoc, I spent the summer of my second year in grad school working with Prasad. He had a summer project prepared for me, but I really wanted to work on a different project. Although Prasad insisted my proposed project was too hard, I convinced him to work on it together. We ended up spending over a year on this project and joined forces with another one of my mentors, Raghu Meka. The arc of the project was very dramatic–there were lots of ups and downs, and we eventually gave up after making little progress.
I think this was actually one of my formative moments in grad school because it taught me a lot about what to do when feeling stuck in research. For example, it taught me that in moments when you're stuck, especially when it's one of your first major projects, that can be an opportunity to just learn random topics that you feel might be useful for the problem that you're working on. This drive to learn specific tools, that's a feeling that’s hard to develop otherwise, without the right motivations. Over the course of that project, I had the opportunity to learn about quantum information because I thought it would be useful for the project. Whether or not that was true is unclear, but it definitely laid the foundations for what was to come in my career.
SC: Yeah, I think this can be tricky. My approach is to work on things that are both fundamental and that I enjoy thinking about. As a researcher, you have the amazing privilege to just be able to pick a particular topic that you like. You're literally being paid, especially if you're doing theory, to just stare at a piece of paper, and bang your head on the desk. Given this incredible privilege, I think it's definitely important to pick problems that are not in a vacuum. If I'm studying some particular object, what are concrete reasons for me to think that this object is something fundamental? One good indicator is if it’s something many different communities are studying, but just in slightly different forms, and under different names. So if you can find topics that have interest across many areas, but also align with your general research tastes, I think it's fine to worry less about immediate practical impact–impact, especially in theory, takes time and is hard to predict.
SC: I I think navigating between different fields is a very common experience among theoretical computer scientists, because the mindset of computer scientists is well suited to tackling areas that might seem very different: we tend to pose problems in general terms. Of course, when you enter new fields, you feel like a total beginner. One thing that’s definitely a tricky aspect of interdisciplinary research is that you quickly find that different communities have completely different ways of referring to the same thing. Not just definitions and terminology, but also their general way of thinking: what is the “right” version of the problem to study? What is an interesting task that I would want to design an algorithm for? What does it mean for a problem to be considered impossible or tractable? I think people interested in this kind of cross-disciplinary research should be cognizant of these differences.
Another piece of advice is, often with interdisciplinary research, there's this temptation to put together buzzwords. Maybe I'm an expert in xyz, and another field seems to care about xyz, so let me just combine these two and call it a day. My preference is for interdisciplinary research that arises more organically. My first research project bridging ideas from classical and quantum machine learning, for instance, sort of happened by accident during my internship at Microsoft Research. My mentor at Microsoft, Jerry Li, and I both jumped into this project because we felt that one of the algorithms we developed for another project could be applicable as an algorithm for tomography using single copy measurements. [Editors’ note: tomography is reconstructing a state out of measurements, like constructing a 3D picture from 2D photos. It is very common to use tomography when we are studying quantum systems, where we can’t measure everything at once. It’s also typical that we only have a single copy of a quantum state.] We spent many, many months trying to make those algorithms work. And, we ended up actually proving a lower bound demonstrating that not only does our algorithm fail in this task, but no algorithm using these types of measurements can do better. It turns out this kind of lower bound is closely related to a research direction that many in the classical community care about, which is understanding how computational constraints affect statistical complexity for basic inference problems. The main moral of this story is that people in the quantum community posed similar questions in the past, but arguably not from the same vantage point of computational-statistical tradeoffs. But as soon as you phrase it in that way, you realize there's this entire bridge between these two communities.
Learning a quantum process. A schematic of learning a quantum process by using classical machine learning to extract low-complexity information about it. Image via Hsin-Yuan Huang.
SC: To some extent, yes, but I think this is still a work in progress. To give one example, in my “classical life,” one of the main challenges I've been obsessed with is determining the right distributional assumptions under which you can prove that a learning problem is tractable. Put simply, what do I need to assume about the underlying process generating my data in order to be able to prove that I can learn things about my data? A recurring conundrum in computational learning theory is that unless you make very strong assumptions, it's typically quite hard to prove anything. One instance in which the quantum picture helped make progress on this general problem was a recent project I worked on with Robert Huang and John Preskill on learning unknown quantum processes. This actually opened my eyes to the interplay between distributional assumptions and a classic algorithm from computational learning theory called the low-degree algorithm. Long story short, the surprising finding in this project was that if you want to predict the input-output behavior of a quantum channel on random inputs, it is possible to do this even with a very limited amount of “snapshots” of the channel’s behavior. This is counterintuitive because the channel could be arbitrarily complex, and naively you might expect that the amount of data and compute needed for such a task is exponential. But because we only care about learning the statistics of its input-output behavior, it turns out this can be done much more efficiently than trying to recover a full description of the quantum channel.
But what does this have to do with classical machine learning? The task I just described is the natural quantum generalization of the classical task of supervised learning: given a bunch of input-output pairs drawn from some distribution (e.g. the input might be a picture of a cat or dog, and the output might be the label “CAT” or the label “DOG”), learn to predict the output given a randomly drawn input. If you make no assumptions about the input distribution or about the function mapping inputs to outputs, you need an exponential amount of data and compute. But in a certain sense, the quantum result showed that if you’re willing to assume the input distribution takes a certain form, not one that is necessarily intuitive to posit on the classical side but which is entirely natural on the quantum side, then for arbitrary functions, you can learn efficiently.
This was one example where just by looking at a quantum problem, it shed light on a classical machine learning problem from an angle I might not have thought to consider from a purely classical perspective. Hopefully, there are more such examples over the coming years.
SC: I view myself as a theoretical computer scientist but don’t necessarily associate with a particular sub area. I can share my thoughts on the general field of learning theory, especially the field of learning theory that's been most adjacent to theoretical computer science. Over the past decade, there's been a lot of interesting influx of models and problem formulations coming from machine learning. One particular model that I've been most excited about has been this line of diffusion generative modeling. This actually ties back to what I mentioned earlier, studying objects that many different people have accidentally been looking at under different names, of which diffusions are a particularly nice example. As a theoretician, I'm excited about this because there's been a lot of focus in theoretical computer science on general questions of sampling and distribution learning: I give you a description of a distribution (e.g. in the form of a Hamiltonian), or I give you samples from it, and I want to be able to generate fresh samples from the distribution. Diffusion models are a rare example of an algorithm coming from practice that actually gives the best known theoretical guarantees for some of these problems. For example, Ahmed El Alaoui, Andrea Montanari, and Mark Sellke showed rigorous sampling guarantees for spin glasses using diffusion models in a setting where no other algorithm had been proven to work. Usually it's the other way around: algorithms in practice work great empirically, but in the toy models we develop to understand them, we can only prove theorems about them in settings where some other algorithm from theory was already known to work. Yet now we have these very interesting probabilistic objects coming from generative AI that sort of challenge this status quo. For my sub-area of folks who care about provable algorithms for distribution learning, these are very exciting times.
Diffusion models. explain. Image via Sitan Chen.
SC: Pretty intimidating. I’m very fortunate however to be at an institution where all the senior faculty are genuinely invested in the junior faculty’s success. They are fully supportive and willing to provide all their wisdom to help you succeed.
SC: In high school, I did a bit of competitive math. I wasn't great at it, but I did enjoy problem solving of that flavor. It’s been a pleasant surprise that the work I’ve ended up doing professionally feels, in a way, very similar. Another surprise is that when I was thinking of doing math at the beginning of undergrad, I kind of assumed that I would be doing something much more abstract. Yet the mathematical objects I get most excited about these days, and the research questions that ultimately keep me up at night, ended up being quite “down to earth.”