Podcast

Episode 41 - Decarbonizing AWS with Adrian Cockcroft

June 25, 2024 - 3 minutes reading
GreenIO Blog - Episode 41 - Decarbonizing AWS with Adrian Cockcroft

Green IO E41 focuses on methods to decarbonize a hyper-scaler, using insights from AWS. Gaël Duez meets Adrian Cockcroft, an influential cloud architect, who has occupied key roles such as VP AWS Marketing, Cloud Architect at Netflix, as well as Distinguished Engineer roles at eBay and Sun. He now actively volunteers with the Green Software Foundation, leading the Green Software Foundation project “Real Time Cloud”.

Listen to the full episode here.

Wrap Up Article

Decarbonizing the cloud – an energy-led approach

If AWS remains the biggest public cloud provider worldwide, it has also received criticism related to its efforts to reduce its environmental footprint. Yet as Adrian knows all too well, there have been, and still are, highly motivated employees who are trying to move the needle in the right direction.

Overall, it is difficult to assess the carbon footprint of software companies as they are part of a wider supply chain, both actively consuming services whilst providing services to others. In Amazon's case, it is important to note that the carbon footprint is much larger than other providers, such as Microsoft and Google, due to the wide variety of activities entailed, including transportation and manufacturing, both of which are carbon intensive activities.

Generally, the energy industry has been decarbonizing, an approach led primarily by cost savings focusing on solar and wind energy production. Switching to renewable energy providers therefore helps to cut emissions when running large IT stacks. The recent expansion of AI workloads globally has led to an overall increase in energy demand. Large cloud providers are now investing in Power Purchase Agreements (PPAs), where external enterprises are contracted to build, for example, a wind or solar farm, and the cloud providers procure 100% of the power output. This recent development means that, currently, the largest commercial purchaser of energy is Amazon. Competitors without PPAs have had to procure additional energy supplies to power the data centers coming on stream, and which for the main part, are derived from fossil fuel.

Cutting emissions in-cloud and of the cloud

Adrian emphasizes that there is a difference between the sustainability of the cloud, and sustainability in the cloud, a terminology he coined back in AWS. Tackling emissions and reducing the impact of the cloud covers subjects such as hardware, building infrastructure, energy production and use. It is solely the responsibility of Cloud providers to reduce this part of the footprint. Concerning reducing carbon footprint within the cloud, which falls mostly on cloud users, Adrian suggests asking the question where a batch workload will be run, and also when it will be run. Smoothing out peaks to reduce utilization and demand on cloud providers, running batch jobs for longer on fewer machines, and desynchronizing deployment by adjusting job schedules are just some of the actions which help deliver concrete results. It’s worth remembering that any additional energy demands are often sourced from gas, due to the ability to store gas and thus manage the fluctuating demand (unlike wind or solar). Yet there are solutions on the market to help find out the marginal production percentage, for example WattTime has begun to provide useful data on real time energy production, though accessing such hourly data production via their API is through subscription only.

Data storage and architecture options to reduce carbon emissions

Choosing the right type of storage for data is another way to help reduce carbon emissions, so the concept of embodied energy needs to be taken into account, as well as actual power consumption. SSDs focus on high performance with high levels of storage. The energy levels in use are low, yet high levels of energy are used to manufacture the SSDs due to the quantities of silicon needed, mostly mined in Asia. With spinning disks it is the reverse, as the cost and impacts of manufacturing are less intense compared to SSDs. Yet a 10w motor is constantly spinning the disk, which means power consumption is high, even when not in use. For some, tape can be a useful option as once manufactured, the tape itself consumes no energy whatsoever. Additional options include compressing data for long term storage, and adopting serverless architecture, where compute capacity is shared with others, and so companies only pay for exact use. This is an efficient option for allocating workloads that are particularly ‘spiky’ in nature.

 

Nuanced metrics

Adrian outlines two basic ways of calculating carbon. The location method focuses on fine tuning and measuring the carbon footprint based on where the facility is located. The in-situ energy production is taken into account, but does not take into account off site energy production into grid (i.e. PPA ). The market method, usually used for corporate reporting, is based on what goods / services have been paid for.  So, buying ‘green energy’ and offsetting might lead to “carbon neutral” results. In general, real time cloud data is difficult to obtain, and it is difficult to compare hyper-scalers as they don’t provide the same types of data. One frequent way to get data is to use billing stats. This can be done via an open-source tool such as the Cloud Carbon Footprint tool (maintained by Thoughtworks). This open-source tool estimates carbon emissions for cloud workloads across the major cloud service providers, querying cloud APIs for detailed billing information and matching the resource usage data with multiple sources to track carbon emissions. Another metric is that of SCI (Software Carbon Intensity), which uses marginal data. Information can then be obtained for a single transaction which provides information per customer, per event, per transaction. Normalizing the workloads down to such a level means measuring the energy used by the data center directly, though this does not include any of the power that is generated offsite by the companies. Therefore, this indicator would provide a much higher carbon number for SCI and cloud carbon footprint than from cloud provider audits. This shows how difficult it is to compare across different organizations, as different data sets are used, for example Google uses location-based data, whereas Microsoft and AWS use market-based data.

Adrian also broaches the question of sovereignty. If companies own the data centers, then metrics on what, when and how machines are being powered are much more easily obtained.  Yet obviously there are associated costs plus carbon footprints related to the hardware itself, including the related upkeep, cooling and housing of data centers. For those using outsourced cloud-based operations, with their shared infrastructure and associated resources and costs, measurements are more difficult to obtain due to the dynamic process of use(rs) combined with the mutualization of costs and impacts.


Future direction

Adrian believes that future contracts will need to look at spot pricing, taking into account metrics on carbon intensity not just the cheapest price. Tools and methodologies will need to be developed according to the different boundaries defined, whether it be workload, service, organizational or planetary. Getting involved is crucial too, underlining the importance of working together across the Tech sector. Adrian is heavily involved in standardizing GSF Software Carbon Intensity metric, a project which aims to be a focal point for communication with the cloud providers. The goal is to make the models more useful, more standardized, even if they will never be perfect. Anyone interested can visit the public mirror board on the GSF GitHub account, where current explorations concentrate on regional metadata, carbon coefficients, electricity maps, PUE ratings etc. In the face of the current political and societal turmoil, the myriad of initiatives currently underway are testimony to the tech sector’s collaborative approach. Sensitive issues can easily divide communities, but they can also bring them together.

Transcript

Intro 00:00
A video conferencing company used to have an outage every Tuesday morning at 09:00 because that's when everybody wanted a video call at once. Their systems would keel over and they had to go beef them up for that one time of the week when there was a big spike in traffic.

Gael Duez 00:26
Hello everyone. Welcome to Green IO with Gael Duez. That's me. In this podcast, we empower responsible technologists to build a greener digital world, one byte at a time. Twice a month on a Tuesday, our guests from across the globe share insights, tools and alternative approaches, enabling people within the tech sector and beyond to boost digital sustainability. And because accessible and transparent information is in the DNA of Green IO, all the references mentioned in this episode, as well as the transcript, will be in the show notes. You can find these notes on your favorite podcast platform and of course on our website greenio.tech 

Before we start, I would love to share a personal anecdote with you. A few weeks ago I had to do a full day of painting in our recently refurbished house and I was happy because I would eventually have the chance to catch up with dozens of podcast episodes that I couldn't listen to in the past weeks. I ended up listening to a fraction of them over the day actually, and it made me realize again how precious our attention is and how dedicating almost 1 hour to a podcast episode is a significant investment of it. And it made me feel sincerely grateful for the thousands of you allocating a precious time every month to listen to Green IO. So in two simple words, thank you and to honor the gift of your attention, today I am going to put on my captain's obvious costume with two statements. AWS remains the biggest public cloud provider worldwide, despite not having that much data on its chinese competitors. I guess it's still a safe bet to say that. Second, IT sustainability experts and environmentally conscious CloudOps working on AWS are not ballistic about the level of services provided by AWS to decarbonize a tech stack, to put it mildly. On the other end, I had the pleasure to meet several AWS employees across Europe highly involved in climate change topics, among others, in trying to move the needle in the right direction internally. 

So what can AWS users achieve today in terms of decarbonization? And what can they expect tomorrow? And what is the general trend for the cloud industry when it comes to reducing its environmental footprint in carbon emission and beyond? Now, who would be the best guest to discuss what to expect at AWS and in the public cloud industry at large than the one who created and led the first sustainability team there? And who has been one of the most influential cloud architects of the last decade, as well as one of the managers behind Netflix's highly successful homepage? And before Netflix, having worked in the most iconic tech companies like eBay and Sun Microsystem. And yes, back then it was iconic. Adrian Cockroft, since it's him we're talking about, has retired in June 2022. But he has a very active conception of retirement. He still gives talks, does some consulting gigs and volunteers actively in the Green Software Foundation. So welcome, Adrian. Thanks a lot for joining Green IO today.

Adrian Cockcroft 03:46
Thanks. Thanks for having me.

Gael Duez 03:48
If we put ourselves in the shoes of a CloudOps, a very climate aware CloudOps, what should I do to reduce the carbon emissions of my public cloud use today? And to be a bit provocative, why shouldn't I do nothing? Because this is my cloud provider issue, to become low carbon. And b, the electricity grid is being decarbonized every day a little more. So why should I care? What will you be according to you, the main actions and maybe some hurdles that you should pay attention to.

Adrian Cockcroft 04:23
I think there's two sides to it. The first thing that most people are looking at is measuring their carbon footprint. And it's not just from an academic point of view. It's because your users want to know what their carbon footprint is. So the one thing is that most software companies are in a supply chain. You consume services and you provide services to somebody else. And along with those services in that supply chain, you need to start communicating how much carbon it is. So I think this is one of the areas that people are starting to realize, is that if I know how much carbon to allocate to the activity of my company, then I need to find a way to allocate that, a proportion of that carbon to my customers via some algorithm for doing that. This is one of the areas which is a bit fuzzy, exactly how to do that. There's some fairly simplistic ways of doing it. 

So the first thing is measuring, and the reason you're measuring in many cases is because your customers are saying, I need to know how much carbon to allocate as a customer. And then the other side is, once you've measured it, then you can say, okay, where is this carbon coming from? And then I can say, well, most of the carbon is coming from here, so we can do some work to optimize that part of it down. And I'd say that there are, I put companies into two different classes, if you like one of them. And sort of, I have an old physics degree. So the way I like to think of companies that push electrons and companies that push atoms, atoms are much, much heavier and more energy. So if you're moving physical things in the physical world, that's atoms, that's where most of your carbon footprint will be. But if you're a purely software company, you're a SaaS provider. Most of your physical carbon footprint is going to be employees, buildings, but your purpose of your company is to do software. 

Software inherently has a pretty light carbon footprint. It's electricity, which is relatively easy to decarbonize. If you're in transport or manufacturing or anything like that, shipping, all of those kinds of things, most of your carbon is going to be somewhere else. So think of whether your company is purely a software company, like maybe a bank, or whether it's a manufacturing or some other kind of company that's moving things around in the world. And then if you're purely software, then most of your carbon footprint will be from your IT stack and you can kind of go in and focus on reducing that. If most of your carbon is from things on the atom side, you may actually want to do more software in order to optimize the hard, you know, the hardware effectively, the physical processes you might want to do run some additional software jobs to optimize scheduling or resource usage in some way. So it kind of depends on what you're, where you're starting from, but that's how to think about it from my point of view.

Gael Duez 07:42
So I know it's a bit of a rabbit hole, but could we actually explore a bit why it is so fuzzy to measure a carbon footprint in the cloud, as you mentioned at the beginning?

Adrian Cockcroft 07:56
Well, if you have a data center, you know how much power is going into that data center and you know where you're getting your power from. And if that data center is completely used by your company, you can just allocate it across the different machines. You can measure everything. When in the cloud, you don't get to measure everything. And there's a lot of shared infrastructure. So one of the reasons the cloud is more efficient is because there's shared infrastructure across customers, and customers are coming and going dynamically. 
So that's one aspect you have to figure out. How much of that network switch carbon footprint should you allocate to the particular customer that is using some percentage of it? Do you even know how much percentage of use that customer is putting on that network? So it's quite hard to measure if you're trying to get a very fine grain accurate measurement. So that's one aspect of it. 

And then when you're trying to figure out the carbon, there's really two aspects to it. Mostly it comes down to electricity and then the carbon used to manufacture and ship the equipment around. And it's mostly manufacturing. And if you look into it. Most of the manufacturing is silicon. Most of the carbon involved in manufacturing computers is actually the silicon, which is very carbon intensive. And that silicon is almost all made in Asia. It comes from Taiwan, Korea, Japan, China. And the carbon footprint, the mix of energy there, is very carbon intensive. Even if you're running on completely green energy, you still have to allocate what's called “Scope 3”, basically the supply chain of the machines that you're using.

Gael Duez 09:54
I think what is quite interesting also, and I don't remember which hyperscaler it was, so I will not drop any names. But there is also, because of this energy consumption boom, I've seen some articles explaining that we are basically turning on, again, older cold powered facility to provide electricity would actually mean that even if you use AI to, let's say, decarbonize something, the energy required might actually slow down the energy transition, because we are relying on older coal plants that were supposed to be decommissioned. But I don't know if these are information that you've also seen on your radar or if you could have been able to double check them.

Adrian Cockcroft 10:42
Yeah, I think it's pretty clear coal is sufficiently dirty and uneconomic that it's being replaced very quickly. I'm not so worried about coal. That's a very short term. If anyone's turning back on a coal plant, it's for a very short period of time. Mostly they're being shut down. What's been the current argument is whether they should be building more gas plants. And gas is a lot cleaner than coral obviously puts out carbon, but it's the least carbon intensive of the carbon options. So it is better, but it's obviously not going to be good enough. And then the other thing with the sort of IEC projections and things is they've consistently underestimated the deployment of renewable energy by a large amount over many years. And it's not clear whether the current projections are really probably being too pessimistic about how much solar plus battery and wind and geothermal and things like that can be deployed. I think people are generally surprised every year by how much solar in particular is deployed. And wind is a bit easier to track because it takes longer to build. You can kind of measure the deployment of that a bit more easily at an economic level. So I think that there's. The optimist says that while renewables will step up and meet the demand, there's still a lot of opportunity to do that, and it's the cheapest form of energy. So this is why we can't really, if we say well good. The energy industry has been decarbonising. They're not decarbonizing because they felt like it would save the world. They're decarbonising because it's cheaper. The lowest cost energy you can get is solar right now and wind as well, although there's more capital investment to put up a wind tower. So that's kind of what's driving the energy industry. 

Then on top of that we've got the large cloud providers who are all buying their own energy. They're investing in what's called Power Purchase Agreements or PPAs, and they will contract with somebody to build a wind farm and take all the power from it or build a solar farm. Most solar farms nowadays have batteries attached to them to extend the power and smooth things out. So that's developed to such an extent that the largest commercial purchaser of energy is Amazon. It's somewhere over 20 gigawatts, which was the last number I heard. It keeps going up exponentially year by year. The amount that Amazon is buying is growing extremely fast. And that's also true of Microsoft and Google. Although the carbon footprint of Amazon is much larger than Microsoft and Google because it's a physical business. Amazon's shipping and manufacturing vastly more than Microsoft and Google who are largely online businesses. A little bit of manufacturing, but not so much then. 

The other piece of news recently was from Microsoft, who have had a commitment for a while to decarbonize their business. But then they recently did a big expansion of their AI workloads and they actually started going backwards. So their carbon footprint was increasing rather than decreasing. And they said that they just hadn't planned that much AI development and they had effectively built data centers full of GPU's to run OpenAI and things like that. And they had not managed to invest in as much power purchase agreements to offset it in the short term. So I'm sure that they're trying to ramp that back up. But that was a recent announcement just in the last month or so that they are at least temporarily falling behind on their decarbonizing commitment. And it remains to be seen how the other cloud providers do on that because everybody right now is deploying as much GPU as they can and it runs really hot. That's a lot of power.

Gael Duez 15:11
I've got a question that I have to ask when I read that Microsoft is currently planning to open one data center per week, one new facility per week, so I don't know how much. It's an announcement to please investors to show that they are actually fueling the AI boom. Or if it's like true figures, there are many ways to count what is a data center. But do you believe that the business model of hyperscalers is sustainable somehow, because this is just so much energy consumption that is added to the electricity grid and so much resources? You mentioned manufacturing. We could also have been mentioning mining and all the environmental impact of mining to provide the metals within this silicon, as you've described it. How do you see the future in 10 or 20 years? This is exponential at the moment, as you said.

Adrian Cockcroft 16:09
Yeah, it's increasing rapidly right now, but the IT industry is only a few percent of the carbon emissions. Buildings are about 40%. I seem to remember transport's 25%, something like that. Mining is a big proportion, so to get a sense of perspective, yes, it's increasing. The main problems are with suddenly wanting to put up lots of data centers. You've got to find a place that you can source the power and the cooling requirements and get a building put in. It takes a little while, but the big cloud providers have had lots of experience in standing up regions, but regions don't come into existence overnight. You can add buildings to existing regions maybe a little quicker than starting a new one, depending on where you are in the world. This is something it probably takes at least six months, maybe more than a year to just get everything permitted and built out, unless you're expanding in an existing building. And one of the things is there's a lot of data center capacity that's basically empty or ready to be renovated, but that's generally lower quality buildings. And what we're looking at here is very high energy density, like the racks are very hot, and it's a specialized build to make sure you have enough cooling. So that's kind of. I think that's what's going on. The AI boom kind of caught people by surprise, and it's sort of working its way through. I think it'll settle down a little bit over the next year. But in terms of the hardware deployments. But the big companies, Meta and Google and OpenAI in particular, are anthropic, those kinds of people that are training huge models, they have tens of thousands of GPUs that they're running for those, and each GPU. So that's an enormous amount of energy and capital deployment.

Gael Duez 18:15
We've barely scratched the surface, but could we go back to sustainability of the cloud?

Adrian Cockcroft 18:20
Yeah, but I actually came up with that wording. So historically, the security model for AWS used to talk about security of the cloud and security in the cloud, and what was the customer's responsibility, what was the cloud provider's responsibility. And that was something that was sort of fairly well understood in the cloud environment as a security model. And when we came to do the sustainability messaging and write the well architected guide, that was something where I kind of adopted those words and in fact, one of the same diagrams and changed it and said, okay, let's talk about sustainability of the cloud, and then sustainability in the cloud, where how do you use it and the interaction between the two? Because one of the things is, if you use the cloud, you might only, there's the allocation of energy effectively or carbon to you, but if you use it in a way that is better for everybody, then you actually reduce the cloud providers' need as well. So one of the things to think about is the impact of what you're doing. Like if you have a footprint, which is that you use no, any, you run basically nothing at all, and then you run a flash sale thing for 1 hour once a week, and you need thousands and thousands of computers for 1 hour a week. And the rest of the week it's like ten computers. The carbon footprint of that is just the area under the curve. It's the number of compute hours that you have allocated to you. But if you think about the cloud provider, they have to have thousands of machines for that 1 hour, and those machines don't disappear at any other time. So they're still there, hopefully with somebody else using them. But what it does by having a very spiky workload, is it increases the capacity that the cloud provider has to have. And overall it makes it harder for them to have high utilization. 

So one of the things you can do to reduce your utilization and also make the cloud provider not need to have as much capacity, is to smooth out peaks. And there's a few techniques for that. But running batch jobs with fewer machines for longer, perhaps scheduling jobs so they don't run all at the top of the hour, very time synchronized, you sort of move them around a little bit so they're offset from the people who tend to schedule things on nice round numbers in terms of time. And that means that everybody schedules something for the same time. One of the companies I used to work with was a video conferencing company, and they used to have an outage every Tuesday morning at 09:00 because that's when everybody wanted a video call at once. Their systems would keel over and they had to go beef them up for that one time, at one time of the week when there was a big spike in traffic. So that's just sort of calendar clocking. 

So those are some of the things that make it easier for the cloud provider themselves to use less carbon because they can run higher utilization, because the average across all of their customers is more independent, there's less coordination. The more coordination there is across the customer base, whereby external events like Black Friday or shopping days, or Christmas or New Year's Day or whatever, those kinds of things, a big sporting event, Super bowl, whatever, those events are where the problem is. So you want to try and spread things out as much as possible. And that's why cloud providers are generally more efficient, because they have statistically averaged across many different industries and many different customers. The average across all of those becomes a much smoother workload than if you're trying to manage a small number of customers or one customer in a data center.

Gael Duez 22:22
Absolutely. And so time shifting seems to be a very easy and straightforward way to reduce the environmental footprint of a public cloud user. Do you also have some advice when it comes to architecture, data storage, or even the way we code?

Adrian Cockcroft 22:42
Yeah, just thinking about storage. The most high performance, high availability storage typically has more copies of the data, and typically on solid state disks. And then if you get more into capacity storage, it's more on spinning disks. Now, the difference between the two is interesting, because we talked previously about scope two and scope three. The energy use of a solid state disk is extremely low. In fact, the energy used to manufacture a solid state disk, there's so much silicon in it, is typically more than the energy it will use in its lifetime. So you look at like, the energy used to manufacture that SSD is whatever it is in tons of carbon, and then you just power it. For several years, you've used less carbon, you've generated less carbon because it uses so little electricity. If you take a spinning disk, it's the other way around. They're made of iron oxide and a bit of metal, and they're fairly well automated, and the carbon footprint of manufacturing isn't too bad. But they have like a ten watt motor spinning all the time, spinning that disc. So they consume quite a lot of power, even if you're not using them. So it's the other way around. And then finally, you look at the tape. And tape is great because you write it to tape, and the tape sits somewhere and gets put in a cabinet. Whatever just sits there consumes no energy. So one of the techniques you can do with storage is to try and archive two tapes, because tape is extremely low carbon. And that's kind of the sort of hierarchy in terms of storage. And then just try to compress. Another thing you can do is just compress things better. You should always be compressing long term storage anyway. But we did some work at Amazon, which did get published, but we switched from GZip to Z standard compression and saved about 30% on storage for one of the large internal workloads. And that was an exabyte or something, because several exabytes of data was being compressed and it went from like, I guess three or four exabytes to two or three exabytes, something like that. So that's kind of the thing you can work on from the storage point of view. It's basically keeping data for less long, compressing it and archiving it to tape in particular, if you can. 

And then I like serverless architecture. It's for most workloads, they are pretty lightly utilized. Your machines are idle most of the time, but you have a big spike in work every now and again, particularly corporate IT type workloads. And I think serverless is the best approach there. AWS Lambda is particularly efficient at that. You're sharing your compute capacity with a large number of other people, and so you only get to pay for exactly what you use. And that's a very efficient way of thinking about how to allocate, particularly workloads that are sort of spiky in nature. And obviously, if you've got something that's running continuously, then it's going to want some more dedicated capacity and you can run more efficiently. If you're running 24/7 effectively with a very constant, predictable workload, you can do that better with a containerized workload. So there's different options there.

Gael Duez 26:25
Once you've done the basic homeworks, I would say, would you advise people to shift to a bit more carbon aware computing, carbon aware cloud operations? I know that you've got some nuanced opinion on it because you've got some unintended consequences, just to quote you.

Adrian Cockcroft 26:47
Yeah, I could talk a bit about that distinction, but the basic questions you've got, if you've got a workload, you've already tuned it up, you think it's running as efficiently as it could run, then the question is, where are you going to run it? When are you going to run it? If you can choose when, if it's a batch workload, you can decide, well, do I want to run it at night because demand is less then.

Do you want to run it during the day because there's a lot of solar energy wherever you live, and that's curtailed. So there's this idea about what's the marginal difference? Like, if you ask for one more kilowatt of energy from your grid, where does it actually come from? Not what's already there, but where does the additional kilowatt come from? And in many situations, that variable amount, you know, there's the baseline of some wind and some hydro and whatever nuclear, those are running all the time. And then the variable amount on top is often gas because it's easy to spin up and down a gas powered station. They're particularly good at that. So a lot of what's called Peak Plants deal with the peaks in load. So when you say, I want, I'm going to use a little bit more capacity to run this workload. If you do it at night, you're mostly probably going to be running on a gas peak plant. And then you need to look during the day and say, well, how much solar is there? If you're in California during the day, there's an excessive amount of solar, and they're actually turning windmills off and not using all the solar power that they could generate because they don't, there's nowhere to put it. So running during the day, for example, I have an electric car and I have a Tesla, and it has an option that says charge whenever there's excess solar energy. I have Tesla batteries in my house and solar panels and a car. And basically the panels charge up the battery for the house. And then when it takes out whatever's being used to run the house, it takes whatever's left over and puts it in the car until the car's finished charging. And just make sure that the car is always charging off fresh solar energy, which has been transmitted several feet from the inverter to the car. It's not going across the transmission line. So it's the most efficient use of that energy. And that's a standard option in the way they're set up. So that's kind of the, what you're trying to look for is what is this marginal use of energy? And so the question is, how can you find out what the marginal production percentage is?

Gael Duez 29:43
That was my question, because it's not that easy to find.

Adrian Cockcroft 29:48
Well, there's a company called WattTime that produces that data. And the problem is that the data is you can get it hour by hour and you can get it as a prediction for the next hour even, but you have to pay for it, it's a commercial company, so you have to pay for a subscription to their API. They provide more general data, and data that's a little older. I think they have one region in California where they provide full access to the data, so you can experiment, but only for that region. So there's the good data you have to pay for, perhaps, if you want to do it.

Gael Duez 30:29
And do you believe that at some point this marginal grid approach, which I like to call grid aware computing, which goes beyond carbon aware computing, do you believe that a service like world time would be provided or even incorporated in some IT solution, in some cloud provider solutions as a very basic API call that you do before launching any job, any training, AI training, whatever, etcetera?

Adrian Cockcroft 31:03
I think it's unlikely to happen in the short term. I think it's more to have it built into the cloud provider. One of the ideas that's been floated would be to change the way the spot market works, so that the pricing in the spot market is effectively driven by the carbon impact, if you see what I mean. So you've got spot pricing for different resources and the people look for the cheap. When you're using spot pricing, you're looking for the cheapest resource. If you could price that resource so that it took carbon into account, so it wasn't really what was cheapest, but it was what was lowest carbon, then that might be an interesting way to get people to lose the low carbon resources. Whereas if you've got something sitting in the spot markets, an unused resource, but it happens to be a high carbon resource, you'd rather people didn't use it, because it's better to have it sit idle and consume minimal power than to have somebody using it. So you could differential spot pricing at different times and around the world, and different instance types, you could bake that in in some way. I don't think the cloud providers are going to are likely to put in their own detailed carbon footprint. So one of the problems is that the data you've got is very hard to measure then. So if you really want to know what the carbon footprint of a particular point in time is in the cloud, the data isn't actually available to really know until sometimes a month or so later, depending on how accurately you want to know. It's one of these things where you can get a rough estimate in real time, but if you want to have something that's going to pass an audit and be reported as a corporate number, you want to be more accurate and you have to wait until everything settles and you get all the numbers from all the different places. And in some parts of the world you can get numbers fairly quickly, but in other parts of the world, the automation isn't there and these are global providers. So if you look at, well, I want to provide this data globally, well, I can give you the data in North America or Europe quickly, but in Asia it's going to take me a month. Well, I'm not going to have different data for different regions. I'm just going to wait a month everywhere. So that's one of the aspects that affects our providers right now. The data you get from them that is for audit purposes, is generally delayed by two or three months. In Amazon's case it's three months. I think in Microsoft it's about two months, something like that. And that's purely so that they can gather the data from everywhere around the world and give you a final number that they don't have to revise when they get more data in. And that's one of the aspects of why it's annoying. So this is great for reporting the carbon footprint of a company. It's useless for tuning a workload because you run the workload, you run it again, I gotta wait three months and the numbers are blended into something else. It's not helpful. So the tool people use for real time, if you like, is an open source tool that was largely developed by thoughtworks called the cloud carbon footprint tool. And that works off the billing records because you pretty much, you know how much you're going to be charged for something. Billing data comes out. You know, I think there's hourly billing data records out of AWS, but certainly you can get daily, and from that you can see all the resources you were being billed for. And then they've got a way to turn those billing reports into a carbon estimate. Again, the data you get there will be different to the data you get from the audit report a few months later. They're not related, they're estimates, and they include different things and they're calculated differently. So you end up with multiple different ways of calculating carbon which will not match. They're just based on different algorithms.

Gael Duez 35:18
And carbon footprint relies on electricity maps approach, which is the carbon intensity of the electricity grid at some point, which is slightly different from the approach that you advocated for, which is a marginal grid intensity. So obviously in the long term, we can safely bet that all the decisions will manage to align. But in the short run, we could have this counter effect of highly optimizing some workloads or shifting services and actually triggering extra fuel powered capacity.

Adrian Cockcroft 35:58
Yeah, it comes under what do you want to use this measurement for? Depends on the number you'll get that drives the number you'll get. So if you are trying to measure carbon in order to report it as part of a corporate, this is the carbon footprint of my company. That's one use case. If you're trying to measure a workload so that you can optimize it, you want to measure two different variations of a workload. You're doing that kind of benchmarking thing, then the cloud carbon footprint is good for that because you can just hold things constant, say, okay, we're in the same region, but how much carbon is this workload? And which portion of the workload is most of the carbon? Coming from those kinds of questions, if you're trying to do real time optimization of where to run a workload, then you want to use the marginal number. And there's another metric called SCI, the Software Carbon Intensity, and that uses the marginal data. So the idea there is you're looking at a particular transaction, say a customer logging in or looking up their account or whatever, that you'd be able to get that single transaction and say, what is the carbon per customer, event per transaction, if you like. So they're trying to normalize the workloads down to that level. Customer visits, a web page, or whatever, a particular piece of whatever your business logic is, what is the purpose of your company? What is a unit of that? If you're Netflix, you showed somebody a movie that is a transaction effectively for Netflix. What is the carbon footprint of showing that customer that movie, or in general, showing customers movies at some point in time, and they use the marginal intensity by default for that. And again, this is location based, which means you're measuring the energy used by the data center directly, and it doesn't include any of the power that's generated by the companies. So you end up with a much higher carbon number for sci and cloud carbon footprint than you do from the cloud provider audits. And this is because there's two basic ways of calculating carbon. One is called the location method, and one's called the market method. And we can go into that a bit more detail, if you like.

Gael Duez 38:30
But for someone trying to reduce its workloads, carbon footprint, it makes more sense to use location based, I would say data. And in that sense, I think SCI and CCF are pretty well aligned with tooling up Ops guys, Ops folks to try to do what they can achieve, whether or not the main cloud provider is trying on its own end to be as low carbon as possible when it comes to energy consumption.

Adrian Cockcroft 39:10
Well, there's a thing there because Google only provides location-based data and Microsoft and Amazon only provide market-based data. And so they're just not comparable numbers. You can't compare across cloud providers at all.

Gael Duez 39:31
And maybe this is the good time to discuss a bit, because we were mentioning that the future of the cloud industry, and one project you lead at the Green Software Foundation is actually about building a real time, not real time, almost real time standards, about this carbon intensity workloads and how to compare it between cloud providers and how to make them as actionable as possible. And before just you enter a bit more the details and you explain what you expect and how well it has advanced in the past months. Does it mean the end of a cloud carbon footprint as it is today if this standard is applied?

Adrian Cockcroft 40:14
I don't think so, because it would get better data and it would be more accurate. You may have some different options, but a cloud carbon footprint underneath that is a bunch of guesses. This sort of unwinds this a bit. The Green Software Foundation was set up a few years ago. It's an interesting organization, it's been quite successful. It's got lots of different groups working on lots of different things, including standardizing this software carbon intensity metric. I proposed a real time cloud project to provide a place where we could engage with the cloud providers and try to get them to align around, providing the same data in the same way and getting it more up to date, because rather than every few months we want it to be more current, like every few hours or something. So I wrote that proposal for wouldn't it be nice if they all had a common standard and that proposal then turned into a project? We've been meeting now for about nine months. Projects like this are very slow, but what we've done is map out every tool, piece of information, whatever, and how they're all related in a big mirror board which you can visit. It's a public mirror board. If you go to GitHub and find the Green Software Foundation and find a real time cloud, there's a GitHub account there with some information about the project, and the mirror board there kind of maps out. And if you find something, if you know of a tool that isn't on that, then let me know and we'll add it in there. So that's one of the things we've done, is sort of document end to end. Where does all this data come from? What's going on? And then we started trying to standardize various points, and one of the things that we're trying to wrap up right now is regional metadata. So this is data that changes fairly slowly, like once a year about a cloud region. So if you're running in US east one, which is the big AWS region in Virginia, what do we know about that region and year by year? Well, there's all kinds of information that you can get on that. What are the different carbon coefficients that would apply to that region? What is the electricity maps lookup key? What is the time lookup key for that region? If you pick one of the Google regions, they have a bit more data available, and for Azure, you get the PUE, the Power Usage Effectiveness, which is like, so if your computer is using a kilowatt in a data center and the PUE is 1.5, that means that it's pulling 1.5 kw from the grid. So it's a number that you have to multiply your energy use by for a particular data center. Data centers in Asia have particularly high PUE numbers because it's hot and humid and hard to cool. Some of them up there as high as two cloud providers generally run better than that, but commercial data centers quite often are around in Singapore or whatever, like 2.0. Some of the better data centers are sort of 1.08 or something like that. Particularly if you're in Finland or something where it's nice and cold, it's easy to keep the machines cool, so you need to just know what that number is for every region. So we're figuring out that number, and it's also a place where we are engaging with Google and Microsoft directly and sort of, I'm using my contacts at AWS to engage with at AWS more indirectly to try and make sure that they're all hearing, please can you do this number? Please, can you all do location? Can we get a PUE number that's comparable and can we tidy up whatever's available and get more consistency across the cloud providers? And they've sort of bought into that as an idea, and we haven't made an enormous amount of progress, I think. But one of these things is it just, it takes a long time to get people to come into alignment, and it takes a long time for them to get around to updating new things. Most of the carbon data comes out in the summer and then in June, July time. So I think in the next few months we'll see more updated information from the cloud providers.

Gael Duez 43:58
And Adrian, I was wondering whether the recently passed European Energy Efficiency Directive (EED), which requires data centers above 500 mw, so all hyperscalers obviously are above this, will be an accelerator or the kind of data that is required, PUE. But you know, PUEwas also the absolute values, not only the relative value, will be an enabler or it's kind of a different way, different approach than the one that you're pushing through the Green Software Foundation.

Adrian Cockcroft 44:38
Really, this project is trying to be a focal point for communication with the cloud providers. We're not trying to duplicate work happening elsewhere. So wherever something like that is happening, we'll just refer to it and go and ask the cloud providers, well, what are you doing about it? And can you please do the same thing about it and try to coordinate somewhere somewhat? So that's the general idea I'm trying to achieve. So you can say that all models are wrong, some models are useful. What I'm trying to do in this project is to make the models a little bit less wrong, a little bit more useful. They're never going to be fully accurate, they're never going to be quite what we want, but we're trying to sort of nudge things in the right direction. And we have a group of people that engage pretty regularly. There's a meeting every two weeks, and I recommend that people that are interested join the Green Software Foundation if their company is there, or at least go visit the website, look at the project and the work that's been done there.

Gael Duez 45:40
Yeah, thanks a lot. And we will put the links in the show notes as usual. You know, we are heading to the end of the podcast and I realize that we've been talking about cloud providers at large. Actually, maybe it's time to talk a bit more specifically about AWS, because you've got a great experience there. And I don't know, I wanted to ask a pretty general question about AWS: how do you feel about the current situation at AWS for the sustainability team and what they've been able to achieve since you left and what didn't move that much and the general trend. Because I was, you know, when I was researching for this episode, I reviewed the slides that you put for it was not QCon, I think it was CNCF Sustainable week. And some of them were pretty depressing when you were comparing it as GCP in Azure. And I was wondering, how do you feel about it? Because hey, you're the guy who pushed for sustainability to become a pillar at AWS. So I guess the situation is not all that dark and all that light. So, could you enlighten us a bit about it?

Adrian Cockcroft 47:00
Yeah, I think there's a lot of good work going on where AWS is helping customers decarbonize their companies. That side of it, lots of good examples. They're happy to talk about that, and that is a significant thing. The other good area is all of the investments in energy generation. And then it comes down to the carbon footprint tool, which was created by a real... I don't know exactly how to describe it. It wasn't a full on service team effort. Right. It was done by a team that wasn't really one of the product teams, because they knew customers needed something and they managed to get it put into the billing information area of the AWS console. That data was hard to get produced and hard to get released. So that was kind of annoying. And then it just sat there. There's been basically no updates on it in two years. And one of the reasons is that just as they were kind of going, okay, we need to staff up a team to do this properly. And there were some internal reorgs that happened as well. And then there was the. The sort of big layoff crunch happened, and they just got short on headcount everywhere. And if you look across the company, where are you going to leave your headcount? You're going to put your headcount on things that generate revenue. This is not revenue generating. And you're going to put things in areas where there is a strong leader who is advocating for this thing being important. And the carbon stuff didn't fit into the normal systems. It was being run by a group within the energy team that builds data centers. It wasn't really being run by the people that build services. It wasn't part of one of the collections of service teams. It's not part of the S3 storage organization, or the EC2 organization, or whatever, or the Lambda organization. There are these large groups of software services. You look at all the AWS services, they form groups, and each group has a manager and VP’s going all the way up and a budget. This was done on the outside of that, as part of the energy team that was basically, their job was to buy PPAs. Most of the work they're doing is doing contracts to build solar farms. And on the side, they built this little tool and managed to get it stuck out so that customers could see it, but it was never properly invested in. So that was the situation two years ago. And then it got worse, because there were layoffs. And then in the last year they've started to reinvest in that team and hired people, but they haven't actually released anything yet. What I can say is that they do now have a team and they are working on it and they are staffed to do something that hasn't come out yet. I'm hoping that they will release some more stuff this summer. And as I mentioned, there is an annual cycle where carbon announcements tend to happen in June July time. The 2023 data takes almost six months to accumulate and get it all tied up and audited. And they generally release an annual report in the summer. And everybody does that report at various times. Amazon, because it's such a large, complex organization. It's got an airline, it's got delivery stuff, manufacturing. It actually takes longer for them to gather everything and sort it out. So we get an update then. And I'm hoping that part of that update will include some more updates on what's going on with AWS. I'd say I'm unhappy that nothing has happened for a year or two, but I'm more optimistic now than I was a year or so ago because there does seem to be a team at least working on it internally.

Gael Duez 50:57
Where I'm a bit surprised is the AWS culture, at least at the beginning, was to really give your customer what they want and they were super reactive, like every day is day one, etcetera. You have a lot of customers, at least in Europe, at least for regulatory reasons, asking for tools. And I was surprised that AWS didn't see it as a business opportunity, not necessarily generating tons of revenues, but being some sort of a market differentiator against competitors. And to be honest, one of the analyses that I found today, I would say, which explained the best, the pretty strong commitment of Google Cloud platform regarding sustainability. And I'm not, as you say, they provide the data on the market globally, on being local-based, those things are not comparable, et cetera, but they brand quite a lot. They sustain the effort because they're number three and they're significantly behind both AWS and Azure. And my bet, and I didn't get any internal information, I will try, but I didn't manage so far, is that they're using the sustainability angle as a market differentiator. So why didn't AWS catch this opportunity to differentiate, according to you?

Adrian Cockcroft 52:30
I think that when it comes down to it, how many deals are going to be lost just on that one thing? Almost all deals are one. On other aspects, they're won by price, features, product stability, security, and the sales relationship. A good sales team will sell something against another team. Most of the time, AWS has been able to sell its way around the carbon data and where it's been super critical. They've just done a deal with that customer. If it's a big enough deal, they've just done a deal that says, we'll give you some information under NDA that will give you the information you need. So that's when it comes down to it, if that is the last thing that's going to make or break a deal, they will do an NDA level disclosure of whatever that customer wants. But that doesn't solve the problem for everyone else. It doesn't solve the problem for small customers that just want to get some data and optimize. So I think that fundamentally, AWS is very customer driven, but it's also driven by the customers that spend money on things. They are driven by customers' purchasing power rather than. Yes, I'd also like this other thing, but it's kind of optional because it's not going to make or break a deal. And I think that what it came down to for a long time was not enough deals were being lost that they could document were lost because of a lack of transparency on sustainability, because the underlying actions they are taking with all of the power purchases are good. They do actually have a substantially low carbon cloud footprint. It's just that the data that they're providing just doesn't let you do anything useful with it from a developer point of view. So we were advocating for this while I was there, and it never really happened that we needed some kind of developer oriented tool that you can use for optimizing workloads. And the sort of the counter argument is, well, if you just wait long enough, we'll get rid of all the carbon and it'll be zero carbon anyway in that region. But that's Scope 2. That counts for the energy, and what's left is the Scope 3, which they're also working on but haven't released, which is going to dominate the amount of carbon it took to make the machines that you're using. So that's the sort of internal kind of viewpoint that I think it was not, it didn't have a strong enough centralized like the team that was working on it was in the wrong organization. That was my kind of thing. As I left, I said, this team, yeah, they're not, there needs to be a real team, they aren't in the right part of the right organization, but it wasn't clear where they should belong and the managers of the different parts of the organization didn't want to move things around. So I think that that has finally moved into that. They are actually part of an organization that makes sense and they have got a team there now. So I think that's the kind of the thing that got fixed, but it took a while and it's only now that they're hopefully working on something that's going to come out soon. So I know there's a team there. I don't know what they're. I couldn't say if I did know anyway, but I don't know when they're going to release what, other than that in general, they are, they've said they are going to do Scope 3 at some point. I think they're going to do better. I hope they're going to do pue. That seems likely. There's some other things they could do that would be fairly easy and they need to be more granular on their carbon tool. So one example, I was looking at a carbon report and it said, you have three buckets. It says, here's your S3 footprint, here's your EC2 footprint. Here is other and another is all other services bundled together. And the numbers sort of moved from month to month and then at some point others shot up and became more than EC2 and S3. And we weren't sure why. And it was in. And then they just have Europe, US, Europe, all of the Americas and Asia. And it wasn't clear which region this was in because the data wasn't per region, so we couldn't tell. And when there's multiple regions we were using in this geography, we couldn't tell which region it was or which service it was. And this workload now is dominated by whatever isn't other. And we can't figure out what to do about that. It's going up, the traffic's increasing, so it's going up anyway. But we're trying to work out what to do about it and there's just not enough resolution in the information to figure that out. And that's kind of escalated internally. Say, come on, we need more data. Here's a clear example of why you need to provide everybody more data. Now on Google and Microsoft, they have much more fine grained data, but it is still delayed by a month or two. They'll tell you service by service and region by region, and I think down to maybe zone level on Google so the data is there. One of the reasons they don't want to provide the data is that you can reverse engineer what's going on inside the organization from a business point of view, and they regard it as sort of leaking proprietary information. But since the other cloud providers are already leaking that information, it doesn't seem that it's that much of a big deal or it shouldn't be that much of a big deal. So that's all the arguments. But ultimately, what it takes is having the right leader and the right organization with the mandate to go and do something, and then it takes just time. And that's, I think, where we are right now. I'm hopeful, but still waiting for more useful information out of Amazon.

Gael Duez 58:42
Well, that was a very comprehensive tour of public cloud issues and solutions and very transparent answers on what is going on in AWS, in your own opinion, because I say it again, you don't work at AWS anymore. So thanks a lot for all this transparency exercise and pedagogical exercise. Also, I think it was very valuable. It might be a bit of a nightmare to make some hard choices of where to cut and what to keep, but thanks a lot, Adrian, that was really lovely to have you on the show.

Adrian Cockcroft 59:17
Yep, thank you.

Outro 59:19
Thank you for listening to this Green IO episode. If you enjoyed it, share it and give us five stars on Apple or Spotify. We are an independent media relying solely on you to get more listeners. Plus, it will give our little team Jill, Meibel, Tani and I a nice booster. Green IO is a podcast and much more. So visit greenio.tech to subscribe to our free monthly newsletters, read the latest articles on our blog, and check the conferences we organize across the globe. The next one is in London on September 19. Early bird tickets are available until June 18, but you can get a free ticket using the Voucher GREENIOVIP. Just make sure to have one before they're all gone. I'm looking forward to meeting you there to help you fellow responsible technologists build a greener digital world one bite at a time.

GreenIO Author - Jill TELLIER
Written by Jill TELLIER
Icon bottom about

Green IO newsletter

Once a month, carefully curated news on digital sustainability packed with exclusive Green IO contents delivered in your mailbox
Icon Bottom About