Azure Synapse Analytics Maintenance

IT

  • Maintenance of Azure Synapse Analytics
  • Impact of maintenance
  • Maintenance notifications
  • Dealing with maintenance

This article deeps dive into the maintenance of Azure Synapse Analytics.

Maintenance of Azure Synapse Analytics

Maintenance involves the deployment of new features, upgrades and fixes.

Two types of this maintenance exist.

  • Planned maintenance (regular software upgrades)
  • Unplanned maintenance (security-related updates that need to be applied urgently)

Planned maintenance is carried out within a pre-defined maintenance schedule (specified days and times).

The feature of this maintenance schedule is intended for more than DW 500C and is not available for less than DW 400C.

Unplanned maintenance may be carried out unexpectedly, and urgent maintenance for security will be done as soon as possible.

These may be notified in advance, but not all notifications may be issued.

Impact of maintenance

During maintenance, a momentary switchover for hardware resources occurs in order to move to the resource to which the update has been applied.

Login and query processes that were taking place at this time may fail due to connection errors.

However, Azure Synapse Analytics has redundant configurations in all environments by default.
Thus even if maintenance occurs, downtime is only a short period of time until the actual role change is completed.
It is not that everything is unavailable during maintenance.

As a PaaS service, Azure Synapse Analytics does not need to be managed by the user, as all security patching and version upgrades are done automatically, but it is very important to deal with the downtime associated with maintenance.

Maintenance notifications

For performance levels below DW400c, no notification is given for maintenance outside the specified hours.

A 24-hour advance notification precedes all maintenance events that aren’t for the DWC400c and lower tiers.

Reference:Maintenance schedules for Synapse SQL pool – Azure Synapse Analytics | Microsoft Docs

For more than DW500, all maintenance events (including outside the specified hours) will also be notified, although security-related maintenance notifications may not be issued in the event of emergency maintenance.

Dealing with maintenance

The following two aspects are recommended to address for temporary errors associated with maintenance .

  • Implement retry logic
  • Shorten transactions

Implement retry logic

Implementing retry logic is recommended to reduce the impact of connection errors for maintenance.

This is because errors can often be resolved with a retry although the process is cancelled and rolled back.

Retry logic for transient errors
Client programs that occasionally encounter a transient error are more robust when they contain retry logic. When your program communicates with your database in SQL Database through third-party middleware, ask the vendor whether the middleware contains retry logic for transient errors.

出典:Working with transient errors – Azure SQL Database | Microsoft Docs

As mentioned above, actual downtime is less than a minute in most cases, but all running and uncompleted processes are cancelled and rolled back when the failover associated with the switchover is carried out.

Therefore it is recommended to establishe a mechanism to automatically retry when an error is detected.

However, it is also important to limit the number of retries in accordance with requirements, rather than performing unlimited retries since there may be cases where downtime occurs for a long period of time due to a major failure or a problem in the client-side configuration.

Shorten transactions

To reduce the impact of connection errors caused by maintenance, it is recommended to make any transaction or make the process as small as possible during maintenance.

Keeping the transaction unit small has the following two advantages when connection errors occur.

  • Reduced time to re-execute
  • Reduced time for rollback

On the other hand, long transactions can lead to longer downtime.

This is because if a process that has been running for a long time fails in the middle of a transaction, it is necessary to roll back the changes for that time, and the database is not ready for use until the recovery process is complete.

Therefore, during maintenance, try to avoid transaction or shorten the process as much as possible.

Although a maintenance window can be between three and eight hours this does not mean the data warehouse will be offline for the duration. Maintenance can occur at any time within that window and you should expect a single disconnect during that period lasting ~5 -6 mins as the service deploys new code to your data warehouse. DW400c and lower may experience multiple brief losses in connectivity at various times during the maintenance window. When maintenance starts, all active sessions will be canceled, and non-committed transactions will be rolled back. To minimize instance downtime, make sure that your data warehouse has no long-running transactions before your chosen maintenance period.

Reference:Maintenance schedules for Synapse SQL pool – Azure Synapse Analytics | Microsoft Docs

 Summary

While security patching and version upgrades are all performed automatically in Azure Synapse Analytics as a PaaS service, it is difficult to control the downtime associated with maintenance.

Therefore, temporary errors associated with maintenance should be addressed from the following two perspectives.

  • Implement retry logic
  • Shorten transactions

Reference