Document: Oracle Data Guard Switchover Meeting Notes
Introduction
This document captures the key points and discussions from the recent meeting regarding Oracle Data Guard switchover procedures, known issues, and troubleshooting steps.
Meeting Summary
The meeting focused on the Oracle Data Guard switchover process, particularly addressing issues encountered in previous versions and the steps to ensure a successful switchover. The discussion included the automation of the switchover process, the role of OEM (Oracle Enterprise Manager), and the importance of pre-checks and post-checks.
Key Points Discussed
Historical Issues with Switchover:
- Oracle 12.1: There were frequent issues where the database on the other side (standby) would not come up as the primary after a switchover. Errors were encountered in the switchover script, and manual intervention was often required.
- Oracle 12.2: Improvements were noted, with the system providing messages indicating the inability to start the database on the other side. However, the database would automatically start, and the cluster would handle the transition.
Switchover Job Submission:
- The switchover process involves submitting a job with the primary and standby names matching the OEM targets. This is crucial as the process runs against the OEM configuration.
- The job is initially submitted with the primary and standby names, and the status is updated in a specific table.
Automation and Monitoring:
- An automated job runs every three minutes to check the status of the switchover job. If the job is in a running state, it will continue to monitor and update the status.
- The script used for the switchover is copied to the standby server and executed there. The log files generated provide details on the success or failure of the switchover.
Troubleshooting Common Issues:
- Firewall Issues: Ensure that the firewall is open between the OEM server and the database servers to allow the switchover script to update the status.
- Log Files: If the log files are not updated, check the standby host for the script execution status and logs.
- Primary and Standby Names: Ensure that the job is submitted with the correct primary and standby names as per the OEM configuration.
Known Issues and Solutions:
- Switchover Failure: In some cases, both databases may be left in standby mode. Manually force the primary database to become the primary again and reconfigure Data Guard.
- Lag Issues: Ensure that the lag between the primary and standby databases is within acceptable limits before initiating the switchover.
- Invalid Objects: Compile all invalid objects before performing the switchover.
- Datafile Status: Ensure all datafiles are online before performing the switchover.
- Flashback: Verify that Flashback is enabled before performing the switchover.
Best Practices:
- Pre-Checks: Perform comprehensive pre-checks to ensure the environment is ready for the switchover.
- Post-Checks: Conduct thorough post-checks to validate the success of the switchover and ensure the new primary and standby databases are functioning correctly.
- OEM Integration: Ensure that the OEM configuration is accurate and that the switchover job is submitted with the correct details.
Action Items
- Firewall Configuration: Verify and ensure that the firewall settings allow communication between the OEM server and the database servers.
- Log File Monitoring: Regularly monitor the log files for any errors or issues during the switchover process.
- Training and Knowledge Sharing: Include key team members in troubleshooting sessions to ensure they are familiar with the switchover process and common issues.
Conclusion
The meeting highlighted the importance of thorough pre-checks, post-checks, and monitoring to ensure a successful Oracle Data Guard switchover. By addressing known issues and following best practices, the team can minimize disruptions and ensure a smooth transition during the switchover process.
Document: Oracle Data Guard Switchover Meeting Notes (Continued)
Introduction
This document captures the key points and discussions from the recent meeting regarding Oracle Data Guard switchover procedures, known issues, and troubleshooting steps. This section continues from the previous meeting notes, focusing on additional issues, listener configurations, and service management.
Meeting Summary
The meeting continued to address the Oracle Data Guard switchover process, particularly focusing on listener configurations, service management, and known issues that can cause switchover failures. The discussion included the importance of ensuring proper listener configurations, handling service triggers, and troubleshooting common issues.
Key Points Discussed
Listener Configuration Issues:
- Single Instance Listener: The listener needs to be up on the standby side after the switchover. In previous configurations, the Oracle home was included in the listener configuration, which caused issues during the switchover.
- Fixing Listener Issues: When encountering errors related to the listener, remove the Oracle home entry from the listener configuration and reload the listener. This ensures that the listener points to the correct Oracle home version.
Common Listener Errors:
- Oracle Home Entry: Errors often occur when the listener configuration points to an outdated Oracle home (e.g., 18c instead of 19c). Removing the outdated entry and reloading the listener resolves the issue.
- Manual Restart: After fixing the listener configuration, manually restart the listener to ensure the database role change is recognized.
Service Management:
- Service Triggers: Services are managed through triggers that start and stop services based on the database role. For example, after a role change, the primary database will stop read-only services and start read-write services, while the standby database will do the opposite.
- Service Creation: During provisioning, services are created for application connections and Data Guard management. Ensure that these services are correctly configured and associated with the appropriate triggers.
Known Issues and Solutions:
- Switchover Failure: Common issues include both databases being left in standby mode. Manually force the primary database to become the primary again and reconfigure Data Guard.
- Listener Configuration: Ensure that the listener configuration is updated to reflect the correct Oracle home version. This is a frequent source of errors during switchover.
- Service Triggers: Verify that service triggers are correctly configured and updated during provisioning and migration. This ensures that services are started and stopped correctly based on the database role.
Best Practices:
- Pre-Checks: Perform comprehensive pre-checks to ensure the environment is ready for the switchover, including verifying listener configurations and service triggers.
- Post-Checks: Conduct thorough post-checks to validate the success of the switchover and ensure the new primary and standby databases are functioning correctly.
- OEM Integration: Ensure that the OEM configuration is accurate and that the switchover job is submitted with the correct details.
Troubleshooting Common Issues:
- Firewall Issues: Ensure that the firewall is configured to allow communication between the OEM server and the database servers.
- Log Files: If the log files are not updated, check the standby host for the script execution status and logs.
- Primary and Standby Names: Ensure that the job is submitted with the correct primary and standby names as per the OEM configuration.
- Service Issues: If services are not starting correctly, check the service triggers and ensure they are correctly configured.
Listener and Service Configuration:
- Listener Configuration: Ensure that the listener configuration does not include outdated Oracle home entries. Update and reload the listener configuration as needed.
- Service Triggers: Verify that service triggers are correctly configured to start and stop services based on the database role. This is particularly important for new databases and during migrations.
Provisioning and Migration:
- Service Configuration: During provisioning and migration, ensure that service triggers are correctly configured and updated. This includes creating new services and updating existing ones.
- Testing: Test the switchover process thoroughly, including verifying listener configurations and service triggers. This helps identify and resolve issues before they impact production.
Action Items
- Firewall Configuration: Verify and ensure that the firewall settings allow communication between the OEM server and the database servers.
- Log File Monitoring: Regularly monitor the log files for any errors or issues during the switchover process.
- Listener Configuration: Ensure that the listener configuration is updated to reflect the correct Oracle home version.
- Service Trigger Verification: Verify that service triggers are correctly configured and updated during provisioning and migration.
- Training and Knowledge Sharing: Include key team members in troubleshooting sessions to ensure they are familiar with the switchover process and common issues.
Conclusion
The meeting highlighted the importance of thorough pre-checks, post-checks, and monitoring to ensure a successful Oracle Data Guard switchover. By addressing known issues and following best practices, the team can minimize disruptions and ensure a smooth transition during the switchover process. Proper listener and service configuration, along with comprehensive testing, are critical to the success of the switchover.
This document captures the key points and discussions from the meeting, providing a reference for the Oracle Data Guard switchover process and troubleshooting steps.
Document: Oracle Data Guard Switchover Meeting Notes (Final Portion)
Introduction
This document captures the key points and discussions from the recent meeting regarding Oracle Data Guard switchover procedures, known issues, and troubleshooting steps. This section continues from the previous meeting notes, focusing on the creation of a checklist, verification steps, and additional automation details.
Meeting Summary
The meeting continued to address the Oracle Data Guard switchover process, particularly focusing on the creation of a checklist for pre-checks and post-checks, verification steps after switchover, and the automation process for managing switchover jobs.
Key Points Discussed
Creation of a Checklist:
- Pre-Check and Post-Check Checklist: It was suggested to create a comprehensive checklist to be used before and after the switchover. This checklist would include all necessary steps to ensure a smooth transition.
- Human Verification: Despite automation, human verification is still necessary. The checklist should include steps for manually verifying critical configurations, such as service names and listener configurations.
Verification Steps After Switchover:
- Log Files: After the switchover, it is essential to check the log files for any errors or issues. This includes both the alert logs and the switchover logs.
- Database Health: Verify the health of the database by checking the status of datafiles, services, and managed recovery processes.
- Shift Handover: The operations team will take over after the initial verification. This ensures continuous monitoring and handling of any issues that may arise.
Automation and Job Management:
- Job Submission: The switchover process involves submitting a job with the primary and standby names matching the OEM targets. The job is initially submitted with the primary and standby names, and the status is updated in a specific table.
- Status Monitoring: An automated job runs every three minutes to check the status of the switchover job. If the job is in a running state, it will continue to monitor and update the status.
- Script Execution: The script used for the switchover is copied to the standby server and executed there. The log files generated provide details on the success or failure of the switchover.
Common Issues and Solutions:
- Listener Configuration: Ensure that the listener configuration does not include outdated Oracle home entries. Update and reload the listener configuration as needed.
- Service Triggers: Verify that service triggers are correctly configured and updated during provisioning and migration. This ensures that services are started and stopped correctly based on the database role.
- Firewall Issues: Ensure that the firewall is configured to allow communication between the OEM server and the database servers.
- Log File Monitoring: Regularly monitor the log files for any errors or issues during the switchover process.
Best Practices:
- Pre-Checks: Perform comprehensive pre-checks to ensure the environment is ready for the switchover, including verifying listener configurations and service triggers.
- Post-Checks: Conduct thorough post-checks to validate the success of the switchover and ensure the new primary and standby databases are functioning correctly.
- OEM Integration: Ensure that the OEM configuration is accurate and that the switchover job is submitted with the correct details.
Additional Automation Details:
- Script Details: The switchover script takes arguments for the primary and standby database names and performs the necessary steps to switch roles. The script updates the status in the database and generates logs for monitoring.
- Failover Script: A failover script is also available for use in disaster recovery scenarios. This script can be used to automate the failover process if needed.
Action Items
- Create a Checklist: Develop a comprehensive checklist for pre-checks and post-checks to ensure a smooth switchover process.
- Verify Listener Configuration: Ensure that the listener configuration is updated to reflect the correct Oracle home version.
- Monitor Log Files: Regularly monitor the log files for any errors or issues during the switchover process.
- Verify Service Triggers: Ensure that service triggers are correctly configured and updated during provisioning and migration.
- Training and Knowledge Sharing: Include key team members in troubleshooting sessions to ensure they are familiar with the switchover process and common issues.
Conclusion
The meeting highlighted the importance of thorough pre-checks, post-checks, and monitoring to ensure a successful Oracle Data Guard switchover. By addressing known issues and following best practices, the team can minimize disruptions and ensure a smooth transition during the switchover process. Proper listener and service configuration, along with comprehensive testing, are critical to the success of the switchover.
This document captures the key points and discussions from the meeting, providing a reference for the Oracle Data Guard switchover process and troubleshooting steps.
#!/bin/bash
LOGFILE="dataguard_precheck.log"
exec > >(tee -i $LOGFILE)
exec 2>&1
timestamp() {
date +"%Y-%m-%d %H:%M:%S"
}
log() {
echo "$(timestamp) - $1"
}
log "Starting Data Guard Pre-Check Script"
# Function to check Data Guard configuration
check_dg_config() {
log "Checking Data Guard Configuration..."
dgmgrl -silent / "show configuration" | grep -i "SUCCESS"
if [ $? -ne 0 ]; then
log "Data Guard Configuration Check Failed"
exit 1
fi
log "Data Guard Configuration Check Passed"
}
# Function to verify archive log shipping
check_archive_logs() {
log "Verifying Archive Log Shipping..."
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 FEEDBACK OFF VERIFY OFF HEADING OFF ECHO OFF
SELECT DEST_ID, STATUS FROM V\$ARCHIVE_DEST_STATUS WHERE STATUS='VALID';
EXIT;
EOF
if [ $? -ne 0 ]; then
log "Archive Log Shipping Check Failed"
exit 1
fi
log "Archive Log Shipping Check Passed"
}
# Function to check database role
check_db_role() {
log "Checking Database Role..."
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 FEEDBACK OFF VERIFY OFF HEADING OFF ECHO OFF
SELECT DATABASE_ROLE FROM V\$DATABASE;
EXIT;
EOF
if [ $? -ne 0 ]; then
log "Database Role Check Failed"
exit 1
fi
log "Database Role Check Passed"
}
# Function to check Data Guard Broker configuration
check_dg_broker() {
log "Checking Data Guard Broker Configuration..."
dgmgrl -silent / "show configuration" | grep -i "SUCCESS"
if [ $? -ne 0 ]; then
log "Data Guard Broker Configuration Check Failed"
exit 1
fi
log "Data Guard Broker Configuration Check Passed"
}
# Function to check for active sessions
check_active_sessions() {
log "Checking for Active Sessions..."
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 FEEDBACK OFF VERIFY OFF HEADING OFF ECHO OFF
SELECT COUNT(*) FROM V\$SESSION WHERE STATUS='ACTIVE';
EXIT;
EOF
if [ $? -ne 0 ]; then
log "Active Sessions Check Failed"
exit 1
fi
log "Active Sessions Check Passed"
}
# Function to take backup
take_backup() {
log "Taking Backup of Primary Database..."
rman target / <<EOF
BACKUP DATABASE PLUS ARCHIVELOG;
EXIT;
EOF
if [ $? -ne 0 ]; then
log "Backup Failed"
exit 1
fi
log "Backup Completed Successfully"
}
# Function to check cluster status (RAC only)
check_cluster_status() {
log "Checking Cluster Status..."
crsctl check cluster -all
if [ $? -ne 0 ]; then
log "Cluster Status Check Failed"
exit 1
fi
log "Cluster Status Check Passed"
}
# Function to check services (RAC only)
check_services() {
log "Checking Services..."
srvctl status service -d $ORACLE_SID
if [ $? -ne 0 ]; then
log "Services Check Failed"
exit 1
fi
log "Services Check Passed"
}
# Function to check interconnect (RAC only)
check_interconnect() {
log "Checking Interconnect..."
oifcfg getif
if [ $? -ne 0 ]; then
log "Interconnect Check Failed"
exit 1
fi
log "Interconnect Check Passed"
}
# Function to check OCR and voting disk (RAC only)
check_ocr_voting_disk() {
log "Checking OCR and Voting Disk..."
ocrcheck
if [ $? -ne 0 ]; then
log "OCR Check Failed"
exit 1
fi
crsctl query css votedisk
if [ $? -ne 0 ]; then
log "Voting Disk Check Failed"
exit 1
fi
log "OCR and Voting Disk Check Passed"
}
# Function to check for lag
check_lag() {
log "Checking for Lag..."
dgmgrl -silent / "show configuration" | grep -i "lag"
if [ $? -ne 0 ]; then
log "Lag Check Failed"
exit 1
fi
log "Lag Check Passed"
}
# Function to check for invalid objects
check_invalid_objects() {
log "Checking for Invalid Objects..."
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 FEEDBACK OFF VERIFY OFF HEADING OFF ECHO OFF
SELECT COUNT(*) FROM DBA_OBJECTS WHERE STATUS='INVALID';
EXIT;
EOF
if [ $? -ne 0 ]; then
log "Invalid Objects Check Failed"
exit 1
fi
log "Invalid Objects Check Passed"
}
# Function to check for offline datafiles
check_offline_datafiles() {
log "Checking for Offline Datafiles..."
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 FEEDBACK OFF VERIFY OFF HEADING OFF ECHO OFF
SELECT FILE_NAME FROM DBA_DATA_FILES WHERE STATUS='OFFLINE';
EXIT;
EOF
if [ $? -ne 0 ]; then
log "Offline Datafiles Check Failed"
exit 1
fi
log "Offline Datafiles Check Passed"
}
# Function to check flashback status
check_flashback() {
log "Checking Flashback Status..."
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 FEEDBACK OFF VERIFY OFF HEADING OFF ECHO OFF
SELECT FLASHBACK_ON FROM V\$DATABASE;
EXIT;
EOF
if [ $? -ne 0 ]; then
log "Flashback Status Check Failed"
exit 1
fi
log "Flashback Status Check Passed"
}
# Function to check switchover status
check_switchover_status() {
log "Checking Switchover Status..."
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 FEEDBACK OFF VERIFY OFF HEADING OFF ECHO OFF
SELECT SWITCHOVER_STATUS FROM V\$DATABASE;
EXIT;
EOF
if [ $? -ne 0 ]; then
log "Switchover Status Check Failed"
exit 1
fi
log "Switchover Status Check Passed"
}
# Function to check listeners
check_listeners() {
log "Checking Listeners..."
lsnrctl status
if [ $? -ne 0 ]; then
log "Listeners Check Failed"
exit 1
fi
log "Listeners Check Passed"
}
# Function to check temp space
check_temp_space() {
log "Checking Temp Space..."
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 FEEDBACK OFF VERIFY OFF HEADING OFF ECHO OFF
SELECT TABLESPACE_NAME, BYTES/1024/1024 AS MB FROM DBA_TEMP_FILES;
EXIT;
EOF
if [ $? -ne 0 ]; then
log "Temp Space Check Failed"
exit 1
fi
log "Temp Space Check Passed"
}
# Main script execution
log "Performing Pre-Checks for Data Guard Switchover"
check_dg_config
check_archive_logs
check_db_role
check_dg_broker
check_active_sessions
take_backup
check_lag
check_invalid_objects
check_offline_datafiles
check_flashback
check_switchover_status
check_listeners
check_temp_space
if [ "$1" == "RAC" ]; then
check_cluster_status
check_services
check_interconnect
check_ocr_voting_disk
fi
log "All Pre-Checks Completed Successfully"
Usage
- Save the script to a file, e.g.,
dataguard_switchover.sh
. - Make the script executable:
chmod +x dataguard_switchover.sh
. - Run the script:
./dataguard_switchover.sh
.
This script will log the output with timestamps to dataguard_switchover.log
and will exit if any of the steps fail. Adjust the script as needed for your specific environment and requirements.
Important Notes
- Backup: Ensure you have a recent backup of both the primary and standby databases before performing the switchover.
- Testing: Test the script in a non-production environment to ensure it works as expected.
- Validation: After the switchover, validate that the new primary and standby databases are functioning correctly.
- RAC Considerations: If you are using RAC, additional steps may be required to manage services and instances across the cluster. Adjust the script accordingly.
Additional Steps for RAC
For RAC environments, you may need to include additional steps to manage services and instances. Here are some additional functions you can add to the script for RAC:
Usage for RAC
- Add the above functions to the script.
- Call these functions at the appropriate points in the script, e.g., after performing the switchover on the primary and standby databases.
By incorporating these additional steps, you can ensure a smooth switchover process for both single instance and RAC environments.
Shell Script for Post-Checks
Usage
- Save the script to a file, e.g.,
dataguard_postcheck.sh
. - Make the script executable:
chmod +x dataguard_postcheck.sh
. - Run the script:
./dataguard_postcheck.sh
.
This script will log the output with timestamps to dataguard_postcheck.log
and will exit if any of the checks fail. Adjust the script as needed for your specific environment and requirements.
Important Notes
- Backup: Ensure you have a recent backup of both the primary and standby databases before performing the switchover.
- Testing: Test the script in a non-production environment to ensure it works as expected.
- Validation: After the switchover, validate that the new primary and standby databases are functioning correctly.
- RAC Considerations: If you are using RAC, additional steps may be required to manage services and instances across the cluster. Adjust the script accordingly.
By incorporating these post-checks, you can ensure that the switchover process has been completed successfully and that both the new primary and standby databases are functioning correctly.
# Function to display usage
usage() {
echo "Usage:
$0 <start date> <end date>
e.g. $0 20240710 20240711
"
exit 1
}
# Check if the correct number of arguments are provided
if [ $# -ne 2 ]; then
usage
fi
start_date=$1
end_date=$2
# Source the profile
source ~/.profile
# Get the list of nodes from the database
nodes=$(sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 FEEDBACK OFF VERIFY OFF HEADING OFF ECHO OFF
SELECT INSTANCE_NAME FROM V\$INSTANCE;
EXIT;
EOF
)
# Check if nodes are retrieved
if [ -z "$nodes" ]; then
echo "No nodes found!"
exit 1
fi
# Function to scan listener logs on a node
scan_listener_logs() {
local node=$1
local start_date=$2
local end_date=$3
host_name=$(ssh $node "hostname -s")
scan_name=$(ssh $node "srvctl status scan | grep $host_name | awk '{print \$3}'")
if [ -z "$scan_name" ]; then
echo "No scan on this host!"
return
fi
listener_log_dir=$(ssh $node "echo \${ORACLE_BASE}/diag/tnslsnr/${host_name}/listener_${scan_name}/trace")
tmp_file=/tmp/scan_local_listener_$$.txt
for f in $(ssh $node "find $listener_log_dir -name '*.log' -ctime -30"); do
if [ ! -s $f ]; then
continue
fi
f_start_time=$(ssh $node "grep CONNECT_DATA $f | head -1 | awk '{print \$1\" \"\$2}'")
f_end_time=$(ssh $node "grep CONNECT_DATA $f | tail -1 | awk '{print \$1\" \"\$2}'")
f_start_time_formatted=$(date -d "$f_start_time" '+%s')
f_end_time_formatted=$(date -d "$f_end_time" '+%s')
start_date_formatted=$(date -d "$start_date 00:00:00" '+%s')
end_date_formatted=$(date -d "$end_date 23:59:59" '+%s')
if [ $f_start_time_formatted -gt $end_date_formatted ] || [ $f_end_time_formatted -lt $start_date_formatted ]; then
continue
fi
ssh $node "grep -w establish $f > $tmp_file"
ssh $node "cat $tmp_file | grep -o '(SERVICE_NAME=.*)' | awk -F '[\(=\)]' '{print \$3}' | sort -u"
done
}
# Loop through each node and scan listener logs
for node in $nodes; do
scan_listener_logs $node $start_date $end_date
done
No comments:
Post a Comment