Resetting Environments – An Alternate Approach

Overviewpenguin

There are times when you are in the midst of testing where you need to reset things back to the way they were; to be able to quickly restart your testing at from a previous point. This is technologically quite easy with current storage infrastructures supporting snapshot mechanisms. This is equally easy in those databases systems, like Oracle, that support advanced recovery capabilities or flashback mechanisms, analogous to storage snapshots. Thus technology is not a challenge for the most part. However with strict governance on segregation of duties, these privileges are not made available to groups outside of system administrators. Relying solely on the system administrators is not only a huge burden on them but also considerably elongates test cycles. This seriously limits areas such as continuous integration, and code coverage testing.

In this article, we discuss an alternate form of resetting the environment that could be made available to a non-privileged user reducing the cost placed on scarce system administration resources.

Common Use Cases

With the technique described below you will be able to reset the environment in about fifteen minutes depending on the number of objects changed. This applies to the following use cases and may also apply to application test cycles. Of course, there are other situations where you would want to reset and environment that could apply as well.

  • Cloning
  • P2T
  • Patching [to a limited extent]

Limitations

The technique uses the common find method to compare files and performs well even when the number of files is large. However, if files are deleted during the process then the problem of searching for files that do not exist is nontrivial and thus is not supported in the current implementation. Consequently this method is not suitable, in general, for those operations that delete files. Some use cases of deletion that are permissible include:

  1. Files such as log and core files are deleted and are considered immaterial.
  2. Files are deleted and never used in the application.
  3. Files are deleted but are replaced by newer files.

Benefits

In environments where storage snapshots are used you should continue to use them as they offer a clean way to handle a variety of cases. The method outlined in this article has the following benefits:

  • It behaves like storage snapshots
  • It does not require root privileges and runs under the application user account
  • It works for file systems that do not support storage level snapshots
  • The file system restore is usually 10-15 minutes depending on the number of files changed and the number of files originally present
  • The database restore is a matter of seconds usually with database flashback (note that this would require DBA access which is typical of a database that is under development control or in a sandbox environment)

Thus this is an excellent technique to use when exercising cloning with FA, for example.

Main Article

How does this work

The technique uses two different mechanisms; one for the database and one for the file system artifacts. It uses the technologies or techniques identified below to mark the database or file system so that any changes can be quickly identified. In the case of the Oracle database this flashback ability is built into the Oracle kernel. For the file system, we use the capabilities of the find command and identify changed or newly created files.

Mechanism

In order to perform the reset the system has to be prepared. To prepare the environment you first shutdown the application processes followed by the database. For example, if this were Fusion Applications, you would shutdown FA, followed by IDM, and then the FA and IDM databases. The specific processes to be shutdown depends on the application that you are working with.

For the Oracle database this entails setting it up with a flashback recovery mechanism and creating a restore point. This is a well-defined and documented process (For 11g see Using Flashback Database and Restore Points). Once the database is set up with flashback you create a restore point. In order to remain consistent between the database and any file system artifacts, discussed below, create a database restore point with a unique name. For illustration purposes, we will use tag. Later on, you would use tag to create the equivalent marker for the file system artifacts.

For the file system based artifacts you first make a copy of the directory structure in its entirety – for example, if IDM is installed in /u01/app/IDM and the inventory is created in /u01/app/oraInventory, then both these directories are copied to /u01/app/IDM.tag and /u01/app/oraInventory.tag respectively. The same would have to be done for IDM. After the copy is done, a sentinel file is created; /u01/app/IDM.tagsentinel and /u01/app/oraInventory.tagsentinel , for example. Note that if this were an FA installation then you would do the same for the FA artifacts as well. If this is a different application other than Fusion Applications you would create the same for the file system artifacts of that application. This file serves as the marker just as the recovery point serves as the marker in the Oracle database to identify any changes that follow that point in time.

Resetting the environment

Resetting the environment is done in the following stages

Shutdown the applications
Flashback the database to the recovery point “tag” created before
Find files that were either created or modified as of the time the sentinel file was created and delete or restore them, respectively.

This is illustrated below with an example that simulates a test which modifies both file system and database artifacts.

The test is performed using a small change to the database and changing a file manually for illustration purposes in an IDM environment. The principle behind the test applies to more involved operations, such as cloning and to more elaborate application environments, such as Fusion Applications.

Preparation

  1. Create the first restore point in each of the databases

create restore point pre_test guarantee flashback database ;

    1. This is a guaranteed restore point and if the recovery destination fills up the database would hang. Therefore, this should be turned off after the testing or the space pressure must be relieved by adding adequate space to the flash recovery area.

  1. Make a reference copy of the directory. The command demonstrated below works on Linux. For other platforms appropriate options should be chosen to preserve symbolic links and not follow them during copy.
    1. cp –Rp idm idm.pre_test

  2. Create the sentinel file
    1. touch idm.pre_test-sentinel

  3. Test

    Simulate the test by doing the following:

  • Change ODS password which prevents oid from starting
  • Changed the node manager file making it a secure listener
  1. These two changes simulate the actions taken by an operation such as cloning, p2t, or any other test that leaves the environment in an indeterminate state.

    Verifying that the environment is broken

    Before we start the environment reset let us verify that the environment is broken.

Start OID


[2014-09-30T20:09:32-07:00] [OID] [NOTIFICATION:16] [] [OIDMON] [host: myhost] [pid: 22753] [tid: 0] Guardian: [gsdsiConnect] ORA-1017, ORA-01017: invalid username/password; logon denied
[2014-09-30T20:09:32-07:00] [OID] [NOTIFICATION:16] [] [OIDMON] [host: myhost] [pid: 22753] [tid: 0] Guardian: [oidmon]: Unable to connect to database,
will retry again after 10 sec
[2014-09-30T20:09:35-07:00] [OID] [NOTIFICATION:16] [] [OIDMON] [host: myhost] [pid: 22829] [tid: 0] OIDMON_STOP: Thread started
[2014-09-30T20:09:35-07:00] [OID] [NOTIFICATION:16] [] [OIDMON] [host: myhost] [pid: 22829] [tid: 0] OIDMON_STOP: Connecting to database, connect string is oiddb
[2014-09-30T20:09:35-07:00] [OID] [NOTIFICATION:16] [] [OIDMON] [host: myhost] [pid: 22829] [tid: 0] OIDMON_STOP: [gsdsiConnect] ORA-28000, ORA-28000: the account is locked
[2014-09-30T20:09:35-07:00] [OID] [NOTIFICATION:16] [] [OIDMON] [host: scae09cn27] [pid: 22829] [tid: 0] OIDMON_STOP: [oidmon]: Unable to connect to database,
will retry again after 10 sec

Node manager property file changes

This change simulates the application configuration files being changed by the operation. In this case, the node manager property file has been changed.

LogToStderr=false
SecureListener=false
LogCount=1

Change SecureListener to true.

Reset the environment

The environment reset is done by resetting both the database and the file system. Note that since this is a controlled test, there were no processes running. In a more involved test there could be application processes running. These would have to be terminated, preferably in a graceful manner.

Flashback the database

The database is flashed back to reset the password and any other changes that may have occurred during the course of the testing. The script should be treated as a template for your environment and customized as needed. This script is provide in the Appendix A.

./reset_idm_database_to_pre_test.sh

 Flashback complete.

 SQL>
Database altered.
SQL> Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> ORACLE instance started.
Total System Global Area 9620525056 bytes
Fixed Size                  2236488 bytes
Variable Size            2885685176 bytes
Database Buffers         6710886400 bytes
Redo Buffers               21716992 bytes
Database mounted.
Database opened.

Reset the file system.

This script finds files that have either been newly added or have been modified (with respect to the timestamp of the sentinel file) and deletes or restores them respectively and thus resets the file system artifacts back to the point in time, before testing began. As already noted, this is a test done in an idm environment and therefore the scripts are named based on that context. Your application stack may be different, and may require one or more of these scripts that are specific to your environment.

cd /u01/app # this may be different in your environment
./reset_idm_binaries_to_pre_test.sh

You should see output such as the following. As can be seen below the script detected that the nodmanager.properties had changed. In addition, there were other non-essential files that had changed by virtue of starting up OID which are also reset.

File changed – idm/config/instances/oid1/config/OPMN/opmn/states/.opmndat will be replaced with idm.pre_test/config/instances/oid1/config/OPMN/opmn/states/.opmndat
idm.pre_test/config/nodemanager/idmhost.mycompany.com/nodemanager.properties
File changed – idm/config/nodemanager/idmhost.mycompany.com/nodemanager.properties will be replaced with idm.pre_test/config/nodemanager/idmhost.mycompany.com/nodemanager.properties
idm.pre_test/products/dir/oid/network/log/sqlnet.log
File changed – idm/products/dir/oid/network/log/sqlnet.log will be replaced with idm.pre_test/products/dir/oid/network/log/sqlnet.log
idm.pre_test/products/app/idm/network/admin/tnsnames.ora
idm.pre_test/products/app/idm/network/admin/tnsnames.ora does not exist, idm/products/app/idm/network/admin/tnsnames.ora file will be removed

Verification

The comparison of the node manager file shows that there are no differences (it has been reset).

diff idm/config/nodemanager/idmhost.mycompany.com/nodemanager.properties idm.pre_test/config/nodemanager/idmhost.mycompany.com/nodemanager.properties

OID start now works correctly.

./opmnctl start
./opmnctl startproc ias-component=oid1

opmnctl startproc: starting opmn managed processes…
[clonepoc@scae09cn27 bin]$ ./opmnctl status
Processes in Instance: oid1
———————————+——————–+———+———
ias-component                    | process-type       |     pid | status
———————————+——————–+———+———
ovd1                             | OVD                |     N/A | Down
oid1                             | oidldapd           |   19078 | Alive
oid1                             | oidldapd           |   19071 | Alive
oid1                             | oidldapd           |   19067 | Alive
oid1                             | oidmon             |   19058 | Alive
EMAGENT                          | EMAGENT            |     N/A | Down

Cleanup

When the tests are done there are a few items that need to be cleaned up. This is especially true of the database, which has a guaranteed restore point. Guaranteed restore points can create space pressures, hence the restore point should be dropped.

Dropping the restore point

  1. Login to the database with a DBA account or SYSDBA
  2. Execute the following command:

drop restore point pre_test ;

File system artifacts clean up

If no more restores are requires, then remove the sentinel file and the reference directory.

cd /some-directory/app
rm –rf idm.pre_test idm.pre_test-sentinel

Conclusion

Resetting an environment is a common activity in most test and development environments. In many cases this is done using storage level snapshots and should be taken advantage of where feasible. In those cases where there are a lot of development environments with automated testing and where dissemination of the authority to execute these operations is controlled, you can utilize this alternate way of resetting the environment using native tools of the environment and some simple scripting. This would yield an environment that is conducive to a variety of automated testing possibilities.

 

Appendix A –
Script for resetting the environment

DISCLAIMER

THE SOFTWARE IS PROVIDED ASIS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

NOTE THAT THESE SCRIPTS ARE WRITTEN FOR THE TEST ENVIRONMENT AND WILL NOT BE APPLICABLE TO YOUR ENVIRONMENT. IT MUST BE CUSTOMIZED AND TESTED ADEQUATELY TO MAKE IT SUITABLE FOR USE IN YOUR ENVIRONMENT.

reset_idm_binaries_to_pre_test.sh

Save the following as reset_idm_binaries_to_pre_test.sh.

The script uses relative notation and has to be present in the location where the idm directory and the reference directories are. Secondly, the actions are commented out with an “echo UNCOMMENT” statement to prevent accidental executions. Once the script’s behavior has been verified as correct for your environment, remove the echo UNCOMMENT words for the action (cp and rm) to take place.

#! /bin/ksh

_tag=pre_test

for _target_dir in idm
do
_source_dir=${_target_dir}.${_tag}
_search_string=`basename ${_target_dir}`
_replace_string=`basename ${_source_dir}`

   echo _target_dir=$_target_dir
echo _source_dir=$_source_dir
echo _search_string=${_search_string}
echo _replace_string=${_replace_string}
_ref_file=${_target_dir}.${_tag}-reference
find ${_target_dir} -type f -newer ${_ref_file} | while read new_file
do
echo ${_new_file} | sed -e s/^${_target_dir}\//${_source_dir}\//” | read _old_file
echo $_old_file
# echo _new_file=${_new_file}
# echo _old_file=${_old_file}
if [ -f ${_old_file} ]
then
echo File changed – ${_new_file} will be replaced with ${_old_file}
echo UNCOMMENT cp -p ${_old_file} ${_new_file}
else
echo “${_old_file} does not exist, ${_new_file} file will be removed”
echo UNCOMMENT /bin/rm -f “${_new_file}”
fi
done
done
exit 0

reset_idm_database_to_pre_test.sh

The script below resets the database using flashback. The same script could be used for both idm and fa with appropriate changes to the ORACLE_SID below.

#! /bin/ksh
export ORAENV_ASK=NO
export ORACLE_SID=idm
. oraenv
sqlplus / as sysdba <<-EOT
set pagesize 1000
shutdown immediate;
startup mount exclusive restrict;
flashback database to restore point pre_test ;
alter database open resetlogs ;
shutdown immediate;
startup
EOT
exit 0

 

 

 

Comments

  1. FYI, This post was originally written by Krish Hariharan.

Add Your Comment