<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Cost Saving - Tag - Lorenzo's Blog</title><link>https://www.k8s.it/tags/cost-saving/</link><description>Cost Saving - Tag - Lorenzo's Blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Tue, 14 Feb 2023 00:00:00 +0000</lastBuildDate><atom:link href="https://www.k8s.it/tags/cost-saving/" rel="self" type="application/rss+xml"/><item><title>HPA vs Rate-limit</title><link>https://www.k8s.it/posts/hpa-vs-rate-limit/</link><pubDate>Tue, 14 Feb 2023 00:00:00 +0000</pubDate><author>Lorenzo Girardi</author><guid>https://www.k8s.it/posts/hpa-vs-rate-limit/</guid><description><![CDATA[<div class="featured-image">
                <img src="/images/hpa-vs-rate-limit/Screenshot_2023-02-14_at_20.15.33-removebg-preview-2.png" referrerpolicy="no-referrer">
            </div><h2 id="intro">INTRO</h2>
<p>Strange&hellip; we are using HPA to increase availability and introducing rate limiting to reduce it?</p>
<p>Well, let&rsquo;s create the context.</p>
<p>This analysis is based on specific assumptions:</p>
<ul>
<li>Cloud environment</li>
<li>Dynamic infrastructure</li>
<li>Minimum resources available</li>
</ul>
<h3 id="hpa">HPA</h3>
<p>In Kubernetes, a <em>HorizontalPodAutoscaler</em> automatically updates a workload resource (Deployment, StatefulSet) to match demand.</p>
<h4 id="patterns">Patterns</h4>
<table>
	<thead>
			<tr>
					<th>Type</th>
					<th>Behaviour</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Slow and temporary</td>
					<td>Daily fluctuations, peaking during the day and troughing at night</td>
			</tr>
			<tr>
					<td>Rapid and temporary</td>
					<td>Short bursts from poorly-behaved downstream services</td>
			</tr>
			<tr>
					<td>Slow and persistent</td>
					<td>Request volume slowly increases as the product sees adoption</td>
			</tr>
			<tr>
					<td>Rapid and persistent</td>
					<td>Abrupt shift from low to high volumes — e.g. called by batch jobs</td>
			</tr>
	</tbody>
</table>
<h4 id="ideal-practice">Ideal Practice</h4>
<table>
	<thead>
			<tr>
					<th>Type</th>
					<th>Ideal Practice</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Slow and temporary</td>
					<td>HPA should add and remove pods as necessary</td>
			</tr>
			<tr>
					<td>Rapid and temporary</td>
					<td>HPA should NOT modify pod count — leave headroom for brief spikes</td>
			</tr>
			<tr>
					<td>Slow and persistent</td>
					<td>HPA should add and remove pods as necessary</td>
			</tr>
			<tr>
					<td>Rapid and persistent</td>
					<td>Leave headroom; HPA adds pods quickly to restore target utilization</td>
			</tr>
	</tbody>
</table>
<h3 id="rate-limit">Rate Limit</h3>
<p>A rate limit is the number of API calls an app or user can make within a given time period. If this limit is exceeded — or if CPU or time limits are exceeded — the app may be throttled. Throttled requests fail.</p>]]></description></item></channel></rss>