To ensure system stability it is important that the team handling operations creates routines that handles day to day task such as monitoring and backups etc.
Key points when operating the system are:
- A good backup/restore plan for disaster-recovery scenarios. Make sure to test and verify continuously.
- Monitor system logs for errors and/or deviant behaviour on a daily basis.
- Ensure time sync is set up properly. This is crucial.
- Monitor disk size usage. Over full hard disks will cause system failure.
- Ensure connectivity. Watch for changes in network setup. Typically closed communication paths between server and user stores (LDAP etc.).
- Network latency. This is especially important when running in cluster mode.