Backup and Recovery is the most important task for any DBA. As a DBA we have to ensure that all our backups are running successfully and reaching netbackup/TSM .
Below are some of the common scenarios we face :
- We do not keep track of backup failures . Especially when we manage hundreds of servers it is hard to keep track of the scripts and return codes
- We dont validate backup (db2ckbkp) always
- Archive logs are hard to manage . Do you keep track of thousands of archive logs across servers that go to netbackup?
- Are you confident that all your backup files are reaching netbackup. How many times have you found that netbackup restore failed when you needed it ?
- Are you doing backup validation tests ? Are you doing regular recovery tests for critical databases ?
Make sure that you validate all your critical databases so that you can have a good sleep. Having a good backup strategy and ensuring that the strategy is executed perfectly is the utmost important task for a DBA. Below is how I do it and would like to hear your feedback:
- Review your backup strategy ; discuss various failure scenarios and come up with an optimal plan that suits your business needs. Are you meeting your defined RPO(Recovery Point Objective) and RTO(Recovery Time Objective) ?
- Take backup –> Validate backup(db2ckbkp) –> Send to netbackup
- Send archive logs from LOGARCHMETH1 to netbackup frequently for critical databases. Do not delete the local(disk) copy of these logs for some time
- Do a weekly check against sysibmadm.DB_HISTORY to make sure backups are successful. I would run a weekly report with a query like :db2 “select START_TIME,END_TIME,SQLCODE,SQLSTATE,entry_status from sysibmadm.DB_HISTORY where operation=’B’ and date(timestamp(end_time)) > (CURRENT DATE – 7 DAYS)”. You can also setup ssh keys and run this against all servers and pull a good weekly backup summary(http://www.thegeekstuff.com/2008/11/3-steps-to-perform-ssh-login-without-password-using-ssh-keygen-ssh-copy-id/).
- If you are using netbackup get access to run a command like “bplist” that will enable you to validate the netbackup activity. Write a script that parses the output of bplist command and validates against sysibmadm.db_history output discussed above. If you are using TSM you can use “db2adutl” to validate the backups
- Perform regular recovery tests for your critical databases. Include rollfoward recovery tests to make sure that archive logs are being backed up properly