Unlocking High Availability and Disaster Recovery for Microsoft SQL Server on OCI

In today’s fast-paced digital landscape, ensuring business continuity for Microsoft SQL Server databases is essential deploying Always On Availability Groups on Oracle Cloud Infrastructure (OCI) delivers a robust, centralized solution for High Availability (HA) and Disaster Recovery (DR), minimizing downtime and boosting resilience through expandable multi-node clusters that leverage OCI’s powerful infrastructure. For organizations migrating on-premises SQL Server setups, OCI offers transformative benefits like seamless transitions with minimal disruption via automatic seeding, failover capabilities, Marketplace Images, or Bring Your Own License (BYOL), while optimizing performance, compliance, and scalability in regulated sectors without major overhauls. If you’re already on OCI, integration is straightforward with pre-configured networking, security lists, Bastion VMs, familiar Windows/PowerShell tools, Active Directory support, and virtual listeners for uninterrupted application connectivity—freeing teams to innovate rather than manage complexity. Overall, this approach provides superior performance via multi-subnet optimizations, enhanced security through private subnets, effortless scalability, significant cost savings, and seamless OCI ecosystem integration, empowering faster ROI, reduced risks, and future-proof data strategies.

OCI SQL Server Always On Availability Groups - canva Optimized

What This Covers

Oracle Official Steps + Your Real-World Troubleshooting Experience

✔ OCI multi-subnet Always On Availability Group deployment walkthrough

✔ Step-by-step tasks from Oracle documentation

✔ Incident: Role down & “5035 no matching interface”

✔ Root cause & fix

Technology Stack (At a Glance)

🖥 OS : Windows Server 2025 (Standard)

🗄 Database : SQL Server 2022 (Enterprise)

☁ Platform : Oracle Cloud Infrastructure (OCI)

🔁 HA Model : Always On Availability Groups

🧩 Cluster : Windows Server Failover Cluster (WSFC)

Architecture & Prerequisites

[VCN] → [3 Private Subnets]

A: SQL Node 1

B: SQL Node 2

C: Quorum Witness

Sql Server AOAG Flow Diagram

✔ Each SQL VM in a separate private subnet (recommended by Microsoft/Oracle)

✔ Active Directory domain present

✔ Bastion host for private SSH/RDP access

✔ Security list rules: port 1433 (SQL), 5022 (AG endpoint)

Task Flow (From Oracle)

Task 1 → Task 2 → Task 3 → Task 4 → Task 5

Task 1 — Configure Secondary IPs (OCI)

[Primary IP]

+ Secondary IP for WSFC

+ Secondary IP for AG Listener

Steps (OCI Console):

1. Go to VM → Attached VNIC → IPv4 Addresses

2. Add WSFC IP

3. Add AG Listener IP

4. Repeat for second SQL node

Checklist

• Node1 Database Server in a private subnet

o Primary IP Operating System access

o Secondary IP 1: Windows Server Failover Cluster IP.

o Secondary IP 2: SQL Server Always On availability groups listener.

• Node2 Database Server in a private subnet

o Primary IP Operating System access

o Secondary IP 1: Windows Server Failover Cluster IP.

o Secondary IP 2: SQL Server Always On availability groups listener.

• Node3 Quorum VM in a private subnet.

o Primary IP Operating System access

Task 2 — Create SQL Server Service Account (AD)

Domain Admin → Create new SQL Service Account

Steps:

1. Log into Domain Controller

2. Open AD → Users → New User

3. Enter account & password

4. Add to local admin & SQL sysadmin roles

Task 3 — Configure Quorum File Share

[Quorum VM]

→ Create folder

→ Share (Everyone or SQL Service Account)

Steps:

1. Log into Quorum VM

2. Create share folder

3. Set sharing permissions

Task 4 — Windows Server Failover Cluster (WSFC)

4.1 Add Failover Clustering Feature

SQL Node 1 & SQL Node 2

Open Server Manager → Add roles/features → Failover Clustering

Create the Cluster (Official)

Wizard Flow

Open Failover Cluster Manager

→ Create Cluster

→ Select Both Nodes

→ Run Validation

→ Name & Finish

Expected Outcome

A cluster is created, but initially offline core resources

⛔ Your Real-World Issue

Cluster role status: ❌ DOWN

Error:

“No matching network interface found

for resource ‘Cluster IP Address’

IP ‘10.0.1.x’ (5035)”

This happens even when:

✔ Network design is correct

✔ Security rules are correct

✔ Cross-subnet communication works

This is not always a network problem more on that below.

Fixing the Cluster (Your Diagnosis + Root Cause)

Root cause: SQL Servers were NOT on the latest patch levels

Steps taken:

Upgrade SQL Server Node1

Upgrade SQL Server Node2

Revalidate WSFC

Verify Cluster Role → ONLINE

Result

Before: Cluster Role ❌ Down

After: Cluster Role ✅ Online

Why It Happened

Outdated SQL builds can cause Windows Failover Cluster resource instabilities, especially when:

✔ Using Always On AG

✔ Multi-subnet environments

Pro Tip

Always patch SQL Server CU BEFORE WSFC configuration

Task 4 Continued — Configure Cluster IPs & Quorum

From Oracle Official Steps

1. Configure quorum witness (File Share)

2. Set cluster IPs with secondary private IPs added in Task 1

Visual:

Configure Cluster Quorum → File Share

Assign WSFC IP → Bring Cluster Online

Check:

• Cluster Quorum Configured

• WSFC Secondary IPs Assigned

• Cluster Name/Computer Object Created

Task 5 — Always On Availability Groups Setup

5.1 Grant Permissions

Add cluster computer object permission in AD:

Cluster → Security → Allow: Create Computer Objects

(Required for AG listener and DNS actions)

Enable Always On & Configure AG

5.2 Enable Always On

SQL Configuration Manager

→ SQL Server Service Properties

→ Enable Always On

→ Restart SQL Server service

Repeat on both nodes

5.3 Create & Backup Sample DB

SSMS → Create database

→ Full backup

Required before AG creation

5.4 Create Availability Group

SSMS → New Availability Group Wizard

→ Add Replicas

→ Select Sync/Failover Mode

→ Finish

Listener steps skipped here; configured separately

5.5 Add AG Listener

Add Listener:

→ DNS name

→ Static IPs (per subnet)

Include both listener IPs created earlier

Final Flow (All In One)

1. OCI VNIC Secondary IPs

2. AD Service Account

3. Quorum File Share

4. Enable WSFC

5. Create WSFC (Cluster)

6. Fix: SQL Patch Update → Cluster Online

7. Grant AD Permissions

8. Enable Always On

9. Create Sample DB

10. Create AG

11. Add AG Listener

Key Lessons

Category Key Takeaway

Networking Multi-subnet requires correct IPs but not always the cause of 5035

Patching Always match SQL Server cumulative updates on all nodes

Cluster WSFC will fail silently if SQL build mismatch

AG Listener needs static IPs on each subnet