Unlocking High Availability and Disaster Recovery for Microsoft SQL Server on OCI
In today’s fast-paced digital landscape, ensuring business continuity for Microsoft SQL Server databases is essential deploying Always On Availability Groups on Oracle Cloud Infrastructure (OCI) delivers a robust, centralized solution for High Availability (HA) and Disaster Recovery (DR), minimizing downtime and boosting resilience through expandable multi-node clusters that leverage OCI’s powerful infrastructure. For organizations migrating on-premises SQL Server setups, OCI offers transformative benefits like seamless transitions with minimal disruption via automatic seeding, failover capabilities, Marketplace Images, or Bring Your Own License (BYOL), while optimizing performance, compliance, and scalability in regulated sectors without major overhauls. If you’re already on OCI, integration is straightforward with pre-configured networking, security lists, Bastion VMs, familiar Windows/PowerShell tools, Active Directory support, and virtual listeners for uninterrupted application connectivity—freeing teams to innovate rather than manage complexity. Overall, this approach provides superior performance via multi-subnet optimizations, enhanced security through private subnets, effortless scalability, significant cost savings, and seamless OCI ecosystem integration, empowering faster ROI, reduced risks, and future-proof data strategies.

What This Covers
Oracle Official Steps + Your Real-World Troubleshooting Experience
✔ OCI multi-subnet Always On Availability Group deployment walkthrough
✔ Step-by-step tasks from Oracle documentation
✔ Incident: Role down & “5035 no matching interface”
✔ Root cause & fix
Technology Stack (At a Glance)
🖥 OS : Windows Server 2025 (Standard)
🗄 Database : SQL Server 2022 (Enterprise)
☁ Platform : Oracle Cloud Infrastructure (OCI)
🔁 HA Model : Always On Availability Groups
🧩 Cluster : Windows Server Failover Cluster (WSFC)
Architecture & Prerequisites
[VCN] → [3 Private Subnets]
A: SQL Node 1
B: SQL Node 2
C: Quorum Witness

✔ Each SQL VM in a separate private subnet (recommended by Microsoft/Oracle)
✔ Active Directory domain present
✔ Bastion host for private SSH/RDP access
✔ Security list rules: port 1433 (SQL), 5022 (AG endpoint)
Task Flow (From Oracle)
Task 1 → Task 2 → Task 3 → Task 4 → Task 5
Task 1 — Configure Secondary IPs (OCI)
[Primary IP]
+ Secondary IP for WSFC
+ Secondary IP for AG Listener
Steps (OCI Console):
1. Go to VM → Attached VNIC → IPv4 Addresses
2. Add WSFC IP
3. Add AG Listener IP
4. Repeat for second SQL node
Checklist
• Node1 Database Server in a private subnet
o Primary IP Operating System access
o Secondary IP 1: Windows Server Failover Cluster IP.
o Secondary IP 2: SQL Server Always On availability groups listener.
• Node2 Database Server in a private subnet
o Primary IP Operating System access
o Secondary IP 1: Windows Server Failover Cluster IP.
o Secondary IP 2: SQL Server Always On availability groups listener.
• Node3 Quorum VM in a private subnet.
o Primary IP Operating System access
Task 2 — Create SQL Server Service Account (AD)
Domain Admin → Create new SQL Service Account
Steps:
1. Log into Domain Controller
2. Open AD → Users → New User
3. Enter account & password
4. Add to local admin & SQL sysadmin roles
Task 3 — Configure Quorum File Share
[Quorum VM]
→ Create folder
→ Share (Everyone or SQL Service Account)
Steps:
1. Log into Quorum VM
2. Create share folder
3. Set sharing permissions
Task 4 — Windows Server Failover Cluster (WSFC)
4.1 Add Failover Clustering Feature
SQL Node 1 & SQL Node 2
Open Server Manager → Add roles/features → Failover Clustering
Create the Cluster (Official)
Wizard Flow
Open Failover Cluster Manager
→ Create Cluster
→ Select Both Nodes
→ Run Validation
→ Name & Finish
Expected Outcome
A cluster is created, but initially offline core resources
⛔ Your Real-World Issue
Cluster role status: ❌ DOWN
Error:
“No matching network interface found
for resource ‘Cluster IP Address’
IP ‘10.0.1.x’ (5035)”
This happens even when:
✔ Network design is correct
✔ Security rules are correct
✔ Cross-subnet communication works
This is not always a network problem more on that below.
Fixing the Cluster (Your Diagnosis + Root Cause)
Root cause: SQL Servers were NOT on the latest patch levels
Steps taken:
Upgrade SQL Server Node1
Upgrade SQL Server Node2
Revalidate WSFC
Verify Cluster Role → ONLINE
Result
Before: Cluster Role ❌ Down
After: Cluster Role ✅ Online
Why It Happened
Outdated SQL builds can cause Windows Failover Cluster resource instabilities, especially when:
✔ Using Always On AG
✔ Multi-subnet environments
Pro Tip
Always patch SQL Server CU BEFORE WSFC configuration
Task 4 Continued — Configure Cluster IPs & Quorum
From Oracle Official Steps
1. Configure quorum witness (File Share)
2. Set cluster IPs with secondary private IPs added in Task 1
Visual:
Configure Cluster Quorum → File Share
Assign WSFC IP → Bring Cluster Online
Check:
• Cluster Quorum Configured
• WSFC Secondary IPs Assigned
• Cluster Name/Computer Object Created
Task 5 — Always On Availability Groups Setup
5.1 Grant Permissions
Add cluster computer object permission in AD:
Cluster → Security → Allow: Create Computer Objects
(Required for AG listener and DNS actions)
Enable Always On & Configure AG
5.2 Enable Always On
SQL Configuration Manager
→ SQL Server Service Properties
→ Enable Always On
→ Restart SQL Server service
Repeat on both nodes
5.3 Create & Backup Sample DB
SSMS → Create database
→ Full backup
Required before AG creation
5.4 Create Availability Group
SSMS → New Availability Group Wizard
→ Add Replicas
→ Select Sync/Failover Mode
→ Finish
Listener steps skipped here; configured separately
5.5 Add AG Listener
Add Listener:
→ DNS name
→ Static IPs (per subnet)
Include both listener IPs created earlier
Final Flow (All In One)
1. OCI VNIC Secondary IPs
2. AD Service Account
3. Quorum File Share
4. Enable WSFC
5. Create WSFC (Cluster)
6. Fix: SQL Patch Update → Cluster Online
7. Grant AD Permissions
8. Enable Always On
9. Create Sample DB
10. Create AG
11. Add AG Listener
Key Lessons
Category Key Takeaway
Networking Multi-subnet requires correct IPs but not always the cause of 5035
Patching Always match SQL Server cumulative updates on all nodes
Cluster WSFC will fail silently if SQL build mismatch
AG Listener needs static IPs on each subnet

