Episode 6: Distributed Computing and Machine Learning with Professor Aditya Ramamoorthy Artwork

ISU ECpE

Welcome to the ISU ECpE podcast, from the Iowa State University Department of Electrical and Computer Engineering (ECpE). Here in ECpE, The Future Is What We Do!

All Episodes

ISU ECpE

Episode 6: Distributed Computing and Machine Learning with Professor Aditya Ramamoorthy

January 27, 2022 • Santosh Pandey • Season 1 • Episode 6

0:00 | 19:37

In this episode, our guest is Professor Aditya Ramamoorthy from the Department of Electrical and Computer Engineering (ECpE) at Iowa State University (ISU). Here, we talk about his research in distributed computing, caching, data storage, and machine learning, along with resources for professional development of our students. This episode was conceptualized, recorded, edited, and produced by Santosh Pandey from the ECpE Department of Iowa State University. The transcript was prepared and edited by Yunsoo Park, Ankita and Richa Pathak from the ISU ECpE Department. The communications and digital hosting was handled by Kristin Clague from the ISU ECpE Department. The music was provided by penguinmusic from Pixabay (Track Title: Inspiring Ambient).

0:00

Welcome to our ECpE podcast series, where we talk about exciting activities within our department. I'm your host, Santosh Pandey. Our guest today is Professor Aditya Ramamoorthy from Electrical and Computer Engineering at Iowa State University. Aditya, thank you for joining us today. We want to talk about your research in distributed computing, caching data storage, and machine learning from a perspective that will be valuable to our student listeners. To start with, could you describe the field of distributed computing technical challenges in the present era? First of all, thank you Santosh for having me on this podcast series and for that kind introduction. So, regarding distributed computing, broadly the idea is that, you know, nowadays we have a lot of computational problems where, you know, the size of the problem, you know, is so large that there's no way you can actually solve it on a single computer. So, a very classic example of this is, you know, if you think about training of machine learning models. Nowadays, these models have, you know, like huge data sets. For example, you know, the way Google does its speech recognition, for example. They use a very huge data set, you know, that comprises data from many different users and they run machine learning models, you know, what they're called Deep Neural Network models on them, which are, have like maybe billions of parameters. So, there's no way you can actually do it on a single computer. So you have to distribute this. And so that brings along a lot of challenges, you know, a lot of technical challenges in the area, such as, you know, what is the best way to do it, how do you manage to do it fast, how do you handle memory requirements or memory constraints more accurately and how do you, you know, deal with security within distributed computing. So, there's lots of such issues, but that's the kind of overarching need for distributed computing and things of that nature. Could you describe some of your research projects that you are doing at Iowa State ? Currently, you know significant part of my research program has been around trying to use some ideas from what is traditionally known as information theory and trying to use those ideas within distributed computation. So previously, you know, if you go back 20 years, when people talked about distributed computing, you would have, for example, if a company wanted to employ distributed computing in their own work, they would first buy a bunch of different computers and then set up a cluster of their own and then use it for that purpose. But in the last, I would say, decade or so, there's been a proliferation of a lot of so-called cloud computing providers, you know, like Amazon is one, Google is another, Microsoft is another, Salesforce. All these companies, you know, basically allow you to rent a cluster. And they give you like a very wide range of machines that you can rent and then you can use for your own purposes and they charge you by the hour. So, one of the issues that comes up there is that many times these machines are not necessarily the latest and the best machines that you might get. And there could be significant variability in the worker node speeds and in their quality, depending on the load that the cloud provider has. For a distributed computing application, if you're not careful, then it could be that the overall job execution time is in fact dominated by the weakest worker, you know, or the slowest worker. So if you're not careful, then you might have this effect dominate your entire computation. So, one part of my work nowadays in the last two or three years or longer has been trying to understand how you can mitigate at this effect, how you can employ distributed computation, but not really suffer from this effect that you're dominated by the performance of the slowest worker. So, that's been one thread of work within my group. And other threads of work within my group have been to do with trying to understand how we can deal with the effect of adversarial workers, broadly speaking security within distributed computation. So like I refer to cloud computing platforms, right? So, here, you're using computers that arbitrarily you have no real reason to trust. There is a chance that certain entities that you're using have been compromised, either the data that you store on them can be leaked to other adversaries, or they could be trying to corrupt your overall job in such a way so that you can't do what you're trying to achieve. So the question there is, you know, how do you address these issues, so can you still compute in the presence of these adversaries, how do you do it, how do you do it efficiently. So that's another line of work. On a related question, what are some of your accomplishments that you're proud of? Of late, we've had some, I would say, very interesting findings within distributed matrix computations. So the idea is actually fairly straightforward, actually, at least the problem statement is quite straightforward. Let's suppose that you are computing just like a matrix-vector multiplication or the product of two matrices. How should you split them and how do you distribute them across multiple worker nodes, such that, you know, first of all, like I said, this effect of the slowest worker slowing you down doesn't happen. And while doing that, you know, also wanting to make sure that the fidelity with which product, whether matrix-vector product or matrix-matrix product kind of round-off errors and so on and so forth. So recently there had been some work in several communities where they had discussed certain approaches. And what we found in our group was that critically, these are not numerically stable. So we found that they suffer from some very, very serious issues when the dimensions of these matrices become very large or the size of the problems becomes very large. And we were able to provide some very, very interesting solutions to it which were backed up by theory and extensive experiments as well. So that was, I would say, kind of like a recent breakthrough in the area. So, on a related question, where does matrix multiplication applied to it or is that pervasive throughout distributed computing? So, I would say basically matrix-vector multiplication and matrix-matrix multiplication - these two things are kind of like the workhorse of any kind of scientific interpretation that you do. So let me go back to that original training of machine learning models that I was telling you about. Basically, the way you use or the way you train a large scale machine learning model is by a technique known as Gradient Descent, which is like an optimization technique. And more often than not, every iteration of Gradient Descent, like every step of the algorithm of iteration that is used for training, involves either one or more than one such basic matrix operations. And also, you know, they appear across the board in things like scientific computation. If you are trying to solve partial differential equations numerically, then it's a sequence of repeated such matrix operations. In general, how is the research in universities and academia different from that in industry within this field? I don't have a very good answer for this. But broadly speaking what I can say is that universities tend to take a slightly more long term view and a more fundamental view of problems. Whereas industry typically is interested in things in the shorter term. Especially at least in the machine learning context, one critical difference that companies have very massive infrastructures that they can use to grade effect in some of these problems, which universities don't have that luxury. In many cases, it's a challenging thing. But by and large, I think philosophically, within a university context, we are trying to look at slightly more fundamental problems, you know, trying to understand problems from a more basic and fundamental perspective rather than trying to do exhaustive experimentation, which sometimes is not feasible, like I said, or sometimes is not the end goal. So, I think the difference is more of a timescale and also what we are really examining, you know, so are we trying to get a more fundamental understanding of what if something works, then why does it work and how can we make it better, what are some fundamental limitations of the techniques. So, what is your approach towards starting a new research project? Where do you identify new problems that are happening, or how do you chase different solutions and how do you find what is the optimized solution? Is that through reading papers or is that through going to conferences or is that through industry collaborations? Anytime I've started something new, it has to kind of be more organic from ground zero. You know, so basically I have typically taken the approach of immersing myself in the basic literature in the area, trying to understand what the, you know, what are the well established things that people have already figured out. And then, yes, I mean, definitely attending conferences and talking to people who are working on topical problems or attending their talks- that all helps, helps tremendously! But yeah, I mean, so there's always a startup phase where you are of floundering in their dark a little bit, but I think that's good and that's how it should be. So, within the campus here and within the department itself, what computing facilities that you have that support your research? I've had a kind of a mixed approach. So, like I said, a lot of my work has been motivated by, you know, a lot of issues, people to face within cloud computing platforms. We certainly use or we rent time on the cloud, a fair bit, specifically on AWS, Amazon Web Services. But within Iowa State, there is a pretty big High Performance Computing group. And I have used their computing clusters as well. They have actually many tiers of clusters. So, you know, so there's a paid cluster, there's a free cluster, there's clusters for teaching, you know, and they come with a variety of different machines, you know, so different computing powers. They also have a lot of GPUs, you know, that they have recently procured. So I have used a variety of them. So, and I think Iowa State overall has a pretty good infrastructure to support computationally intensive work. So, how can our new students and current students have access to these high performance computing facilities? Typically the way it works is that, if you are a Ph.D. student, let's say, or a, you know, master's research student within a particular group, your advisor can ask for access to these clusters. And again, like I said, there's different tiers. So there are tiers that require the principal investigator to actually contribute towards the operation of the cluster, which is to say that they have to buy certain norms and then people use that in a shared fashion. But there are other clusters that are available for free, you know, so you just get access to it because you're a member on the faculty. Do you think the learning curve is steep to use these computing facilities? I don't think it is necessarily very steep. It also depends on exactly what you're trying to achieve. There are certain things that, you know, simply mean that you have multiple nodes other than just a single node. There are some slightly more sophisticated techniques such as MPI, which is Message Passing Interface where you are trying to program at a slightly lower level, how the computers are working together. But I would say that a lot of the open source packages that have recently come about, like within Python, for example, PyTorch and many others, they make it a lot easier to do these kinds of things notice. Yeah. So, moving on, could you describe your leadership role within the department? What are some of your achievements in this role that you had and moving on, what's your vision in this role? I have a administrative appointment as the Director for Student Professional Development in my role, you know, so at least one part of my role has been to try to, I would say, enhance the overall student experience, whether at the undergraduate level or at the graduate level. I have taken this role on and I've tried to make sure that I've tried to make students more aware about a lot of different career different higher education. When I say higher education, I mean more like graduate school opportunities that students have. And a lot of the things that maybe routinely students might miss, I've tried to make sure that they're aware about a lot of these opportunities. I've also tried to maybe expose students to what graduate school opportunities might look like, what is expected of them within graduate school and, you know, what it takes to apply for very competitive fellowships, NSF graduate fellowships or the Department of Defense graduate fellowships and so on and so forth. More recently, we have a program within the department that we have started, where we have encouraged a lot of junior level and senior level undergraduate students to get involved in undergraduate research. And so last year, we ran a small internal competition where faculty members were provided some seed funding to help recruit undergraduate research students into their own groups. And overall, I think that was an effort that was very well received and went very well. We're still following up on how things went and how we can, you know, run it again this year. Lastly, and this is largely due to the pandemic in my opinion, we've noticed a significant uptake in, you know, mental health issues within many of our students, whether at the undergraduate level and graduate level. So, I have been involved in some activities in trying to organize workshops that are tailored towards, you know, people trying to either seek out mental health resources or trying to make them aware of what is available on campus and what kind of help they can expect. That was something I did last semester as well. So how can students be aware of any of these activities within professional development? Do they go to the department website or do they contact you? Normally any kind of event or these kinds of activities or information sessions that we run, we make sure that they're always advertised. At least I would say two weeks in advance on the department website and also in our newsletters. So that's how we typically have gone about it. So how do you recruit students in your group? As far as graduate students are concerned, typically we look at the applicant database. Normally I don't recruit a student directly just based by looking at their r ecord. I do try to make sure that I go over all the students in the database. I try to do like an initial shot list, as far as I'm concerned, you know, what people who might be suitable for my group. And following that, I do talk to every individual student before I make a decision on whether I should pursue them further or whether I think they will be a good fit for my group. And what about undergraduate students? Can they directly email you and come to your lab and talk to you? Yeah. So that's something that I've typically encouraged. Occasionally, I have been in touch with Student Services as well to tell them that I'm looking for a undergraduate student. Our next question is on your teaching responsibilities within the department. Could you tell what courses you teach? So, at the undergraduate level, I nominally teach a couple of different courses. So I usually teach CprE 310, the course title is "Theoretical Foundations of Computer Engineering". So, this is a course that talks about the basic discrete mathematics that underlies much of computer engineering. So, you know, we try to understand about graph theory, about mathematical induction, proposition logic and things of those nature. I also teach the undergraduate Digital Communications class, which is a senior level class. It's an elective class. And sometimes Signals and Systems, which is a required course. Typically people, students take it in either sophomore or junior year. And at the graduate level, I've taught a bunch of different courses. So I've taught mathematical optimization, I've taught courses on probability and random processes and, several other courses of that nature. So why is it important for our students to take these courses? As far as, you know, CprE 310, that I just spoke about. That's like a very, very basic course. It's also a required course for computer engineering. So, I mean, it, it exposes you to all the basic building blocks of what you commonly use for programming, what you commonly use for proofs, trying to understand, you know, rigorously, why some things work and some things don't work. As far as Signals and Systems is concerned. That's a, again, a required course for the electrical engineering majors. It's all about trying to understand waves, trying to understand signals and how to model them and how to process them. So, as a student, what skillsets are needed from my end to excel in this area? Do I need a strong programming background? Do I need a strong mathematics background? I think it's very hard to decouple these things. So, I would say that having a good handle on Python programming, or at least one other programming language like C++, or Java or something is almost like a necessity. It's almost like the English language. I think, regardless of whether you are an electrical engineer or even, I would say a mechanical engineer, I think these skills are very, very important. So, I view that as a very hardcore baseline, you know. So if I view that as a baseline, then yes, being mathematically mature, trying to be strong in your basic mathematical skills - that goes a long way in making sure that you do well in our courses and also well overall in, you know, academics or industry. So, a follow up question - what kind of career path are chosen by your graduating students. Do they go into industry or do they go into universities? I've had like seven Ph.D. students graduate. So two of them have gone on to more academic careers. So I have one student who has been a faculty member at Missouri Science and Technology on tenure track and another student who's joining a PostDoc position at Purdue soon. The other students that I have have gone on to industry. So I've had, for example, many of my students have gone on to tech companies, large tech company is in the Bay Area, including Google, Pinterest and Seagate and so on and so forth. And one of my students, interestingly, went on to a slightly nontraditional path. He's now in Wall Street, you know, he's like a quantitative developer at a hedge fund. So, any final word of advice for our students? So broadly from an incoming student standpoint, I would say that we have a very good program here at Iowa State with a lot of very interesting projects across different groups. I would strongly encourage students to, you know, try to reach out to faculty members who you think might be interested in your area and try to make an informed decision on, you know, whether to apply, and you know, how to orient your application in such a way so that people take notice of it. Overall, I think these are exciting times to be in the broad Data Science, Machine Learning area. I think, there's a lot of scope for, you know, explosive growth in the area. And I think if you're positioning yourself right, I think you could be in for a very successful and rewarding career. Thank you. Thank you so much for your time. I think we learned a lot today. Thank you ! Thank you !

Santosh Pandey

Host