Fix Always ON Connection Timeout Error 35206 in SQL Server
Recently, my company has migrated all critical SAP systems from on prem Linux based Oracle databases to Azure Cloud windows based SQL Server. These are very high end machines on which very large SQL Server databases are running with Alwayson configuration between 3 replicas. These databases are highly transactional, very large in size approximately more than 10 TB and these are configured to run critical SAP systems. Since few weeks, we have started facing very strange issue of connection timeout ( SQL error 35206 ) on one of our secondary availability replica. Below is full text of this SQL Server error 35206.
A connection timeout has occurred on a previously established connection to availability replica ‘manvendradbggp03’ with id [availability group id]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
You may also receive similar issue under different alwayson error 35201. Below error will generate while attempting to establish the connection to availability group but failed because of connection timeout error 35206. The error text of this SQL error 35201 is given below.
A connection timeout has occurred while attempting to establish a connection to availability replica ‘manvendradbggp03’ with id [availability group id]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance.
If you are always getting above issues and unable to establish any connection to its respective availability replica then you should immediately look in to network side whether there is poor network latency, any network packet drop issue or firewall is blocking some connections or not as suggested in error code. But if you are getting this issue on intermittent basis then you should keep reading this article to understand more about alwayson error 35206.
SQL Server Error 35206 and Error 35201 in Always on
Both errors come because of connection timeout error between their respective replicas. Microsoft has categorized these errors as product issue and released cumulative updates to fix them. You should also consider these cumulative updates as per your SQL Server version. These issues will be very problematic if you will not address them on time because of below reasons:
- Availability databases will stop synchronizing. This could be a risk of potential data loss.
- Log file will start growing on your primary replica. This could be a potential risk for an outage if we will not address this issue on time.
- Frequent repetitive alerts for your critical systems
You must fix this issue by applying the cumulative update to avoid above problematic scenarios. But if you already have applied these updates and are still facing these issues then i would recommend you to read below section to try out other options.
Fix Always on Connection Timeout SQL Error 35206
Generally, this problem encounters only on high end systems where you have more than 24 cores CPU and SQL Server is highly transactional. One very simple workaround to overcome this issue is to restart SQL Server service on your secondary replica where you are getting this issue. Don’t restart SQL Server on primary replica. Restarting SQL Server on impacted availability replica will surely fix the issue for the time being.
Additionally, you should consider below checks to review and make changes accordingly as per your system design if you are still getting this issue on frequent basis.
Use Default Network Packet Size
Review network packet size configuration in SQL Server. Microsoft suggested to keep network packet size configuration with default value but sometime we make changes as per our need and forgot to revert these changes to their original values so i would recommend to revert this change to its default value. Read below article if you want to learn how to change network packet size configuration in SQL Server.
You should also consider evaluating your network bandwidth and firewall settings to make sure and isolate the issue. I would suggest you to work with network engineer on this front to review network or firewall related configurations.
Review Connection Timeout Setting
Another option i would recommend you to evaluate and review the session or connection time out settings. Make sure that session timeout setting for your always on configuration must be more than or equal to 10 seconds. You can consider increasing session time out if you have slow network between your replicas. Although if you are running highly transactional databases on high end box i assume you already have good network bandwidth but nothing wrong in reviewing the settings. Read below article if you want to reconfigure or change session time out in always on availability group.
If you are running your always on availability group configuration between multi subnet environment then i would recommend you to read attached Microsoft article to fix connection timeout errors in multi subnet availability groups.
- Fix SQL Server Error 976: Cannot connect to Secondary Replica of AlwaysON Availability Group
- SQL Server Alwayson Error 35250: Joining database on Secondary Replica resulted in an error
- Fix SQL Server Error 19471: Listener issue that came during Configuring SQL Server AlwaysOn Availability Group
- SQL Server Alwayson Interview Questions & Answers
- Fix SQL Error 18456: failed to open the explicitly specified database - September 18, 2021
- Fix Always ON Connection Timeout Error 35206 in SQL Server - July 23, 2021
- How to Enable Preview Features in Azure Data Studio - July 15, 2021