Tags « ipSpace.net blog

high availability

High availability refers to the ability of a system or application to keep running even if there is a problem or failure. This is important because if a system or application goes down, it can cause problems for those who rely on it. To achieve high availability, multiple copies of the system or application are set up in different locations so that if one fails, the others can take over and keep things running smoothly. This helps ensure that the system or application is always available when needed.

ChatGPT explaining application high availability to a high school kid

Before going into the details, it’s worth figuring out what the application (or system) users need as opposed to what they think they need:

Fifty Shades of High Availability (2020)
Figure Out What the Customer Really Needs (2017)
Are Business Needs Just Excuses for Vendor Shenanigans? (2020)
Redundancy Does Not Result in Resiliency (2017)
High Availability Planning: Identify the Weakest Link (2016)
Meaningful Availability (2020)
Differential Availability (2020)

Not surprisingly, IT vendors sell magic infrastructure solutions as the high-availability panacea based on the assumption that redundant infrastructure cannot fail. Nothing could be further from the truth:

High Availability Fallacies (2011)
If Something Can Fail, It Will (2012)
How Hard Is It to Think about Failures? (2016)
This Is What Makes Networking So Complex (2013)
Decide How Badly You Want to Fail (2019)
Sometimes You Have to Decide How You Want to Fail (2015)
Some People Don’t Get It: It Will Eventually Fail (2016)
The Network Is Reliable and Other Stories (2016)
Circular Dependencies Considered Harmful (2021)

High Availability Concepts, Technologies, and Solutions

You can use a plethora of approaches depending on your availability targets:

Disaster recovery is the right tool for the job if you’re OK with the system being down for a few hours.
Automatic restart of application instances combined with disaster recovery is acceptable if you can accept your system to be down ~0.1% of the time (99.9% availability)
Availability targets higher than 99.9% can only be reached reliably with proper application design supported by well-designed infrastructure.

I wrote over 130 blog posts on these topics. It would be impossible to list all of them on a single page; major high-availability technologies or concepts thus have dedicated pages:

One of the prerequisites for highly available services is also redundant networking infrastructure:

Redundant Data Center Internet Connectivity – Problem Overview (2013)
Redundant Data Center Internet Connectivity – High-Level Design (2013)
Coping with Byzantine Routing Failures (2014)
Site and Host Multihoming (2023)
High Availability Switching (2024)

Regardless of your approach, the only sustainable way to get highly available services is the correct design of the application stack. For more details, watch the Designing Active-Active and Disaster Recovery Data Centers webinar; I also wrote a few blog posts on the topic:

Swimlanes, Read-Write Transactions and Session State (2017)
Solving the Problem in the Right Place (2017)
Moving Complexity to Application Layer? (2017)
Optimizing the Time-to-First-Byte (2021)

Notable Outages

Finally, here are a few notable outages. TL&DR: it can happen to the big guys and will eventually happen to you.

How GitHub Learned How Hard Distributed Systems Are (2023)
Cloudflare Control Plane Outage (2023)

NSX

add comment

network automation

add comment

video

We published hundreds of public videos covering dozens of technologies on ipSpace.net. Networking technologies covered in free videos include:

Artificial Intelligence and Machine Learning

Introduction to AI/ML Hype (2021)
Machine Learning 101 (2021)
Machine Learning Techniques (2022)
Use Cases for AI/ML in Networking (2022)
The Long Tail of AI/ML Problems (2022)
Ugly Challenges of Using AI/ML in Networking (2022)
Language Models in AI/ML Landscape (2023)
Language Model Basics (2023)

More in the AI/ML in Networking: The Good, the Bad and the Ugly webinar (with more videos coming soon).

Border Gateway Protocol (BGP)

Simplify BGP Configurations (2017)
History of BGP Route Leaks (2023)
Hacking BGP for Fun and Profit (2023)
Outages Caused by Bugs in BGP Implementations (2023)

More in the Network Security Fallacies part of the How Networks Really Work webinar and the Internet Routing Security webinar.

Business Aspects of Networking Technologies

Define the Problem Before Searching for a Solution (2020)
Know Your Users' Needs (2020)
Should You Build or Buy a Solution? (2020)
High-Level Technology Guidelines (2021)
Lessons Learned: Technology Still Matters (2021)
Lessons Learned: Fundamentals Haven't Changed (2021)
Lessons Learned: Complexity Will Kill Your System (2021)
Some Services Are Not Worth Delivering (2021)
Lesson Learned: The Way Forward (2022)

More in the Business Aspects of Networking Technologies webinar.

Cloud Networking

Cloud Models, Layers and Responsibilities (2019)
Public Cloud Networking Overview (2020)
We Still Need Networking in Public Clouds (2021)
Public Cloud Networking Is Different (2021)
How Can You Master Public Cloud Networking? (2021)
Cloud Services Hierarchy (2022)
Functions-as-a-Service Demo (2022)
Cloud-Native Environments (2022)
Cloud Infrastructure-as-Code (2022)
Migrating into a Cloud (2023)

Cumulus Linux

What Is Cumulus Linux All About? (2015)
Cumulus Linux Base Technologies (2015)
Cumulus Linux Architecture (2015)
What is Cumulus Linux All About (2020)
Simplify Device Configurations with Cumulus Linux (2020)
NetQ and Cumulus Linux Data Models (2020)

Ethernet VPN (EVPN)

EVPN Multihoming Taxonomy and Overview (2022)
EVPN Multihoming Deep Dive (2022)
MLAG with EVPN (2023)
vPC Fabric Peering with EVPN Multihoming (2023)
Advantages and Drawbacks of EVPN-based Multihoming (2023)

FRRouting

FRRouting Overview (2019)
FRRouting Architecture (2020)
FRRouting Configuration and Performance Optimizations (2020)
FRRouting Usability Enhancements (2020)
FRRouting Deployment Guidelines (2020)

IPv6 Security

Reconnaissance in IPv6 (2012)
IPv6 Secure Neighbor Discovery (SEND) (2013)
IPv6 Source Address Validation Improvement (2013)
IPv6 uRPF and Neighbor Discovery Throttling (2013)
IPv6 Address Assignment and Tracking (2013)
Dual-Stack Security Exposures (2013)
IPv6 Security Overview (2020)
IPv6 Trust Model (2022)
Practical Aspects of IPv6 Security (2022)
Rogue IPv6 RA Challenges (2022)
IPv6 RA Guard and Extension Headers (2022)
Testing IPv6 RA Guard (2022)
Traffic Filtering in the Age of IPv6 (2022)
IPv6 Traffic Filtering Details (2022)

Kubernetes

Why Do We Need Kubernetes? (2021)
Kubernetes Principles (2021)
Kubernetes Architecture (2022)
Kubernetes Networking Model (2022)
Understanding Kubernetes Pods (2022)
Typical Kubernetes Inter-Pod Traffic Walk (2022)
Kubernetes Services Overview (2022)
Kubernetes Services Types (2022)
Exposing Kubernetes Services to External Clients (2022)
Kubernetes SDN Architecture (2023)
Sample Kubernetes SDN Implementations (2023)
Kubernetes Container Networking Interface (CNI) (2023)
Kubernetes Calico Plugin (2023)

More in the Kubernetes Networking Deep Dive webinar (with more videos coming soon).

Leaf-and-Spine Fabrics

Multi-Stage Clos Fabrics (2013)
Building a L3-Only Data Center with Cumulus Linux (2016)
SPB Deep Dive (2017)
Overlays in Data Center Fabrics (2017)
Routing on Hosts Deep Dive (2017)
Challenges of Data Center Fabric Deployments (2017)
Building Data Center Fabrics with SPB (2017)
Building a Pure Layer-3 Data Center with Cumulus Linux (2017)
Data Center Fabric Validation (2017)
Separate Data from Code (2017)

Networking Fundamentals

Overview of Networking Challenges (2019)
Introducing Transmission Technologies (2019)
Beyond Two Nodes (2019)
The Need for Network Layers (2019)
Retransmissions and Flow Control in Computer Networks (2019)
Putting the Networking Layers Together (2019)
Breaking the End-to-End Principle (2019)
Fallacies of Distributed Computing (2020)
The Network Is Not Reliable (2020)
End-to-End Latency Is Not Zero (2020)
Bandwidth Is Neither Infinite Nor Cheap (2020)
Networks Are (Not) Secure (2020)
Internet Has More than One Administrator (2020)
Networks Are Not Homogenous (2020)
What Are Bridging, Routing, and Switching? (2020)
Getting a Packet Across a Network (2020)
Finding Paths Across the Network (2021)
Path Discovery in Transparent Bridging and Routing (2021)
Transparent Bridging Fundamentals (2021)
IP Routing Fundamentals (2021)
Comparing Routing and Bridging (2021)
Typical Large-Scale Bridging Use Cases (2021)
Introduction to Network Addressing (2021)
Theoretical View of Network Addressing (2021)
Early Data-Link-Layer Addressing (2021)
Local Area Network Addressing (2022)
Network Layer Addressing (2022)
Comparing TCP/IP and CLNP (2022)
Combining Data-Link- and Network Layer Addresses (2022)
Network Address Assignments (2022)
Network Address Scopes (2022)
The Basics of Network Address Translation (NAT) (2022)
Routing Protocols Overview (2022)
Link State Routing Protocol Basics (2023)
Link State Routing Protocol Implementations (2023)

More in the How Networks Really Work webinar (with more videos coming soon).

Networking Labs

Could I Use netlab instead of GNS3? (2022)
What Can Netlab Do? (2022)
Getting Started with netlab (2023)
netlab Topology File (2023)
netlab IP Address Management (IPAM) (2023)

More in the Network Automation Tools webinar (with more videos coming soon).

Software-Defined WAN (SD-WAN)

What Is SD-WAN? (2018)
SD-WAN Reference Design (2018)
Going Beneath the Cisco SD-WAN Surface (2020)
Cisco SD-WAN Fundamentals and Definitions (2020)
Cisco SD-WAN Solution Architecture and Components (2020)
Cisco SD-WAN Routing Goodness (2020)
Cisco SD-WAN Onboarding Process (2020)
Cisco SD-WAN Policies and Centralized Magic (2021)
Cisco SD-WAN Policies Review (2021)
Cisco SD-WAN Routing Design (2021)
Cisco SD-WAN Site Design (2021)
Cisco SD-WAN Policy Design (2021)
Managed SD-WAN Services (2022)
Challenges of Managed SD-WAN Services (2022)
SD-WAN Backend Architecture (2023)
SD-WAN CPE Architecture (2023)
Security Aspects of SD-WAN (2023)

More in Software-Defined WAN (SD-WAN) Overview, Cisco SD-WAN and Business Aspects of Networking Technologies webinars (with more videos coming soon).

Switching and ASICs

Switch Buffer Architectures (2017)
Big- or Small-Buffer Switches (2018)
Tools and Knobs to Use when Tweaking TCP Performance (2018)
ASICs 101 (2020)
Packet Buffers in Data Center ASICs (2023)
Chassis Switch Architectures (2023)
Types of Switching ASICs (2023)

Azure

add comment

cloud

add comment

TCP

add comment

DNS

add comment

WAN

add comment

containers

add comment

load balancing

add comment

EIGRP

EIGRP was the best choice for an interior gateway protocol in late 1990s – it was fast, efficient, and easy to deploy. OSPF and IS-IS implementations improved in the intervening 30 years, slowly turning EIGRP into a forgotten technology.

On a more serious note, I wouldn’t deploy EIGRP in new network designs for compatibility reasons (no major networking vendor apart from Cisco implemented it), and I’d use BGP in designs where a single router has to deal with hundreds of adjacent routers (the only scenario where EIGRP still outshines OSPF and IS-IS).

While the ultimate sources of EIGRP wisdom remain the EIGRP Network Design Solutions Cisco Press book and RFC 7868, you might want to read these articles and blog posts describing EIGRP implementation details and deployment guidelines.

The Basics

Implementation Details

EIGRP Deployment Scenarios

add comment

NTP

add comment

ACI

add comment

DMVPN

DMVPN is an old¹ Cisco-proprietary technology that combines NHRP, IPsec, IKEv2 and multipoint GRE tunnels to build dynamically-provisioned multi-access VPNs.

The easiest way to master DMVPN is to watch the ipSpace.net DMVPN webinars, and every now and then someone still finds them somewhat useful:

I also wrote dozens of DMVPN-related blog posts. Hope you’ll enjoy them!

The Basics

DMVPN always relies on a hub-and-spoke topology, but enables direct communication between spokes (Phase-2 DMVPN) and simplified routing with NHRP redirects (Phase-3 DMVPN).

Routing Protocols in DMVPN Networks

Routing protocols face significant challenges in DMVPN networks due to very large number of directly-connected neighbors, with EIGRP faring better than OSPF, and BGP being the only viable solution in deployments with a very large hub-to-spoke ratio.

Typical DMVPN Designs

DMVPN Deployment Guidelines

Integration with Other Network Technologies

DMVPN Alternatives

Quirks and Implementation Details

I wrote numerous blog posts documenting DMVPN quirks while preparing the materials for the DMVPN webinars. Most of these blog posts were written in early 2010s and might no longer be relevant.

As in: created around 2010. For more details, listen to the History of DMVPN with Mike Sullenberger. ↩︎

add comment

Category: Tags

High Availability Concepts, Technologies, and Solutions

Notable Outages

Other High Availability Blog Posts

Contents

Artificial Intelligence and Machine Learning

Border Gateway Protocol (BGP)

Business Aspects of Networking Technologies

Cloud Networking

Cumulus Linux

Ethernet VPN (EVPN)

FRRouting

IPv6 Security

Kubernetes

Leaf-and-Spine Fabrics

Networking Fundamentals

Networking Labs

Software-Defined WAN (SD-WAN)

Switching and ASICs

Other Videos or Video-Related Blog Posts

The Basics

Implementation Details

EIGRP Deployment Scenarios

The Basics

Routing Protocols in DMVPN Networks

Typical DMVPN Designs

DMVPN Deployment Guidelines

Integration with Other Network Technologies

DMVPN Alternatives

Quirks and Implementation Details

Other Blog Posts Vaguely Related to DMVPN