Wednesday, September 24, 2014

OOW 2014

There is one night every year I can stay up as long as I can manage without fearing the next day. Because, on that particular day I can sleep as much as I want on the plane, trying to fool my body into GMT-8, better known as PDT or  Pacific Daylight Time. Yes, it is Oracle Open World again, in the second nicest city in the world, only beaten by Rio de Janeiro.

It is also an opportunity to blog at an unusual hour (at least for me). Though it is not like many have inquired about my travel plans, but it feels good to put it down in writing that I am happy to see Oracle friends again, some I only see in SFO.

Since last OOW, I became an Oracle ACE Associate, and more than ever I am looking forward to meet friends in the OTN Lounge, and that party, you know.

What else... OUGN is paying for most of my expenses and my employer is letting me go, so I'll try to give them both some satisfaction back. Mostly I will try to target and convince some speakers to come to our conference next year, increasingly known as the Boat Conference. If you are reading this and are curious, please interrupt me whatever I'm doing (mostly).

I've learned a few lessons and so far not overloaded my agenda. This week I will go to the global user group leaders meeting (IOUC) on Thursday and Friday. Wednesday morning until 2 pm I'll be at the EMEA user group stand. I will go to different sessions at the Oaktable World, hang out on different social events and of course a bunch of sessions.

Somewhat related, Tim Hall, aka Oracle-Base blogged this today. Surely he will feel reinvigorated after the Bloggers' Meetup and the small talk with Oracle Nerds from all over the world looking for freebies in the OTN Lounge.

The easiest way to get in touch during OOW next week, is probably on Twitter.

Yours,
@OyvindIsene

PS. I did not know how to spell reinvigorated, the spell checker helped me.
PS2. Come to think about it, there are probably a few more late nights without anxiety the following day. But I can't find a better opening right now.

Wednesday, August 27, 2014

Revoke privileges from PUBLIC may cause ORA-600

Oracle Enterprise Manager (EM) reports on many security policies when they are violated. One of them are about execute privileges on standard PL/SQL packages granted to PUBLIC. One a day this summer with nothing better to do I went on to revoke these privileges. The wages for my efforts was several  ORA-600 errors:
ORA-00600: internal error code, arguments: [qmxqtmChkXQAtomMapSQL:2], [], [], [], [], [], [], [], [], [], [], []
00600. 00000 -  "internal error code, arguments: [%s], [%s], [%s], [%s], [%s], [%s], [%s], [%s]"
*Cause:    This is the generic internal error number for Oracle program
           exceptions.     This indicates that a process has encountered an
           exceptional condition.
*Action:   Report as a bug - the first argument is the internal error number
A quick research on My Oracle Support indicated that some standard packages depends on these privileges granted to PUBLIC. The solution is to grant back to PUBLIC if you are in a hurry, or grant the privilege to the owners of the individual packages (like XDB). The errors continued after the granting was done, but stopped after I bounced the database instance.

Just because EM reports something and the solution seems obvious, it is not alway wise to apply remedy immediately, especially in production. It surprised me a little that EM suggests a security policy that Oracle's own packages do not adhere to.

Sunday, August 24, 2014

Correlation between restore time and used space in a datafile

Recently during restore of a large database (over 10 TB) we monitored the alert log looking for messages like  «Full restore complete of datafile 42 to datafile copy … Elapsed time: 0:39:32»

We saw that the time to restore a 32GB file varied a lot, from around 20 minutes to around 75 minutes. This made me wonder why. We noticed that some of the files belonging to the UNDO tablespace  took usually little time, and then I found a couple of more data files that took little time to restore. Then I thought that maybe the UNDO tablespace at the time of the backup had many unused blocks (because there weren’t many active transactions at the time). So I checked the other two files that did not take long time to restore and found out they had lots of free space in them too. 

So I decided to take the file id and the restore time from all the messages in the alert log and load it into Oracle, in the source database (this all happened when we were making a clone for a RAT session). Then I converted the elapsed time to seconds, found  file size from DBA_DATA_FILES and free space for each file from DBA_FREE_SPACE. Calculating the correlation between restore time and (file size - free space) resulted in 0.96, but for the correlation between restore time and file size, resulted in 0.38. (I used the CORR function.)

By the way the backup was stored on a file system created on a logical volume from a group whose   LUNs where stored on an EMC VMAX 20, but the data files were restored to ASM on LUNs in an EMC XtremIO. 

 As pointed out on Twitter:

it could be caused by the RMAN compression of unused blocks. Next step would be a test with creating a database with several files with different degree of free space, and doing two RMAN backups, one with and another without compression and compare restore time.

This may sounds like a lot of fuzz about nothing, but what we had in mind was the deduplication that the XtremIO does; wondering if the deduplication of empty blocks could  take less time than blocks with real data in it.

Wednesday, July 23, 2014

DBMS_LDAP and ORA-12703

In case you are using the DBMS_LDAP package from a reasonable modern environment you may run into the error:
ORA-12703 this character set conversion is not supported
The simple solution to this is to run:
begin
  DBMS_LDAP.UTF8_CONVERSION := false;
end;
/
or if you are using SQL*Plus or SQL Developer, just:
exec DBMS_LDAP.UTF8_CONVERSION := false

Monday, July 7, 2014

Connecting to a guest console in Oracle VM

I struggled to connect to the console of a newly created VM. The VM was created using Oracle VM Manager (OVMM) and despite efforts to install the required packages I could not use the console button in the Manager interface.


I noticed that the VNC server built into the VM was listening on 127.0.0.1 only (on the OVM server), you can see this if you check the file vm.cfg for the guest. Without analysing how the OVMM would try to connect to the VNC server on the other host I decided to create an ssh tunnel from my Mac to the OVM server:

ssh -L 5901:localhost:5901 root@fu.bar.com

(replace fu.bar.com with IP address or host name of your OVM server).

Now, I think I have found a reason why Oracle specifically recommends TightVNC as a VNC viewer; I tried two other VNC viewers on my Mac, but I could not make them work with the console on the VM, sometimes it connected, but later it would hang, another just aborted. I downloaded the java version of the VNC viewer from http://www.tightvnc.com/download.php and connected without any problems to localhost and port 5901 (using the tunnel created above).

Friday, July 4, 2014

Importing ISOs to Oracle VM using OVM Manager

Just a simple trick to import ISO files to the repository using Oracle VM Manager (OVMM). The import function supports imports through protocols http, https and ftp. If you want to do this in the supported way you can use a simple web server to serve the ISO files for you.

Python comes with a simple http server built in. If you have python installed (which is the case on both the OVM server and the OVMM host) you can start it with:
python -m SimpleHTTPServer

To make it easy for you, go to directory where the ISO file is located, start the http server and verify that you can see the file listed in a web browser. Then do the import in Oracle VM Manager.


In this case I copied the ISO file to OVM server (dom0) and started the http server there, and the URL for the import becomes as shown in the capture below (or use the IP address of the server where the ISO file is located and remember to use port 8000):

The name of the imported ISO is the same as the file name, rename to a say Oracle Linux 6 U5 and put in the origin of the file in description.

Wednesday, July 2, 2014

Real Application Testing - some notes

This post is a mixture of experiences from a recent project and an overview on how to use RAT in a project.

Real Application Testing (RAT) is a licensed feature for the Enterprise Edition (EE) of the Oracle database. It was released on 11g, but can be used to test migration of databases from 9i and 10g to 11g (assuming that the source databases have a required version/patch). The term Real is, I reckon, used to hint at a more authentic testing process. The tests are executed based on a capture from a real production system.

RAT can be used to:

  • Evaluate new hardware, including migration from one platform to something completely different.
  • Verify execution of SQL from today’s production on a newer version of the database
  • Analyze the effect of changes in configuration
  • Scaling up, see how current system works with higher load; this works best for queries / reporting and not so well for data manipulation (DML).

A project involving RAT can be divided into these activities:

  1. Capture workload from a production system
  2. Create a clone of production system with flashback database enabled and a restore point created.
  3. Prepare for replay on test database and with agents on application servers or other client machines.
  4. Execute replay
  5. Generate reports (RAT and AWR)
  6. Flashback database
  7. If you haven’t reached a conclusion yet, repeat from 4.

In two projects I was asked to do step 2 - 6 and figured out that I should write down what I learned for posterity. (The fact that I have a friend in Belgium that reminded me about blogging more is a mere coincidence.) I will not put many details on step 1 and 2 here.

Some think it is a good idea with a long capture, but keep the replay part in mind and the time it takes to actually execute the replay. Though the flashback of the database is impressively fast, a longer replay (with much updates) will lead to more time spent on setting the database back with flashback. If you can limit the capture to say two hours that is a time saver later, but if you need to confirm say scalability for one night's ETL jobs, you may need to capture several hours. Before you start with capture, make sure you have all the backups you need to duplicate the database with point in time recovery as close to the starting point as possible. This should not be an issue since all productions system (IMHO) should run in archiving mode and being backed up. This is an opportunity to verify your backups.

For large databases it may take quite some time to transport the backup (and capture files) to new platform and have it restored to a clone. Flashback database is a wonderful feature since it takes litle time to flashback a database compared to a full restore for large databases. Flashback is turned on with command:
ALTER DATABASE FLASHBACK ON;

A restore point is created with:

CREATE RESTORE POINT B4_REPLAY GUARANTEE FLASHBACK DATABASE;

The name of the restore point is your decision (within limits). You will need some space for flashback logs, depending of the amount of changes to the database during replay.

In a bypass; yes, there are as usual two ways to do this. Enterprise Manager for the young and aspiring DBAs, or you can use the PL/SQL packages. When I have the routines ready for using the API I tend to favour them, so I did most of the work with the packages DBMS_WORKLOAD_CAPTURE and DBMS_WORKLOAD_REPLAY together with the packages for creating AWR reports DBMS_WORKLOAD_REPOSITORY. Sure, I had some looks at nice graphs and reports in EM too.

Going into more details on step 3; the result of the capture is a collection of many files that needs to be transported to the new platform. On the test database create a database DIRECTORY object that points to the file directory where they are stored. First step is to process the capture with:

exec DBMS_WORKLOAD_REPLAY.PROCESS_CAPTURE('REPLAY')

where REPLAY is the name for the directory. The result of this is a new directory with a name that starts with pp followed by a version number. Copy it all to the machine(s) that will host the replay agents (actually replay the workload and send it to database.)  If you are doing this on a RAC-database, copy the same stuff to other nodes with same path for the directory object.

In each installation of the Oracle database and client you have the workload replay client (wrc) included under $ORACLE_HOME/bin. It’s task is to send SQL from the captured workload to the database.  Before you start the actual replay a calibration should be run first. With ORACLE_HOME and PATH set, execute:


wrc mode=calibrate replaydir=/tmp/replay

This will report the recommended minimum number of clients. Too few means that the replay clients cannot replay the workload fast enough. The same will happen if the servers where the wrc binaries are overloaded, or the network between them and the database have any issues.

So if you have to execute 10 clients, put 10 copies of this in a script:

#!/bin/bash
wrc system/welcome1@clonedb mode=replay replaydir=/tmp/replay 
sleep 1
wrc system/welcome1@clonedb mode=replay replaydir=/tmp/replay
sleep 1

wrc system/welcome1@clonedb mode=replay replaydir=/tmp/replay 
sleep 1

Don’t run it yet.

Now we’re onto step 4. You repeat the following sequence if you decide to repeat a replay, but the preparation hitherto is only done once. Initialize the replay with the following command, using a distinct name for the replay each time:

exec DBMS_WORKLOAD_REPLAY.INITIALIZE_REPLAY(replay_name=>’run_1’, replay_dir=>’REPLAY’)

In the middle of this we have to do some remapping... The connections to the database used during production capture are not the same as used during replay. After the previous command has completed you can query the view DBA_WORKLOAD_CONNECTION_MAP and see what was used during capture. This view is empty before the initialize_replay procedure is executed, after it has values in the column CAPTURE_CONN, but REPLAY_CONN is empty. First time you have to look at the output and see if you have many different connections to the database and figure out how you will remap them. Often, only one connection string is used (connection string as the key in TNSNAMES.ORA) and the mapping is simply replacing it with the one you are now using to reach the new clone database. In that case you can use the following PL/SQL block to remap:

begin
  for i in (select conn_id, capture_conn 
    from dba_workload_connection_map m, dba_workload_replays r
    where replay_id = id
    and name = ‘run_1’)
  loop
    dbms_workload_replay.remap_connection(connection_id=>i.conn_id, replay_connection=>'cloneserver:1521/clonedb');
  end loop;
  commit;
end;
The name => ‘run_1’ refers to the name you gave to the replay. You can either use an entry from tnsnames.ora or connection string as shown above when setting the new connection. If you have some load balancing or application partitioning where you use different connections to the database they have to be remapped accordingly. Though you have to execute this for every replay, you only spend time on this first time and keep the PL/SQL ready for next time you do a replay.

Now after initialize, next step is prepare (!) In this step you set various parameters that how the replay is executed. In it simpliest way:

exec DBMS_WORKLOAD_REPLAY.PREPARE_REPLAY(synchronization => TRUE, think_time_scale=>100, scale_up_multiplier=>1)
synchronization set to ‘TRUE’ is equal to ‘SCN’, another more advanced (according to manual) is ‘OBJECT_ID’. This sometimes causes problems for the replay and testing several times with synchronization to ‘OFF’, ‘SCN’ or ‘OBJECT_ID’ may be necessary.

Later on when a normal replay has been executed it is always interesting to see what happens when you reduce think time or scale up the load. Note, that DML, when scaled up won’t actually update the same way, so this needs some testing from you. For reporting usage scaling up should work, however, note if you double the number of users, they are in realilty not likely to send the same SQL as today twice as many times, that is, users usually will run slightly different queries. But if this is a problem you have to find out by analysing the reports - if an a few statements are popular in an OLTP system you can assume they will remain so after you scale up.

Now you can add filter for what will be actually replayed, but I am going to leave it out. After you have exuted the prepare_replay procedure above, you can kick off the script on the clients, that is the script that includes the wrc commands. They will start and write out a message that they are waiting for replay to start.

You can start the replay with:

exec DBMS_WORKLOAD_REPLAY.START_REPLAY()
When this is executed the replay is running and the agents (wrc) writes out a message that replay has started.

EM has a nice page for monitoring a replay. A few times we have seen that the replay lags behind compared to capture, but catches up later on, or sometimes takes longer time than the capture. Which sometimes means an error with RAT itself and not necessarily a problem with the new platform.

In step 5, after replay is finished, you generate a RAT report and also AWR reports. In DBA_WORKLOAD_REPLAYS you’ll find the replay_id of the replay and the RAT report can be generated with:

select dbms_workload_replay.report(42,’HTML’) 
from dual;

Generate the AWR reports as usual. If you want to compare AWR report from capture with replay, you will have to import the AWR data (AWR data is automatically exported after the capture and included in the same directory). First verify that the capture is visible in DBA_WORKLOAD_CAPTURE. If not you can import all the data about captures found in the directory with:

exec dbms_workload_capture.get_info(‘REPLAY’)
where ‘REPLAY’ is the name of the DIRECTORY-object where capture is stored. The capture(s) should now be visible in DBA_WORKLOAD_CAPTURES. The AWR-data belonging to it can be imported with

select dbms_workload_capture.import_awr(42,’LUSER’) 
from dual;

Take a note of the number returned. This is a generated DBID that you will use later to refer to the database of the capture (this means that this is a made up DBID that is different than the DBID of the database where you executed the capture). 42 above is the capture_id from DBA_WORKLOAD_CAPTURES and LUSER is the name of a schema used for staging. Just create on temporary if you don’t have one ready.

The import can take some time. When it is finished you can see in DBA_HIST_SNAPSHOT what snapshots are available, check for the DBID that was returned in last command. You can now create comparsion reports with:

select * 
from table(dbms_workoad_repository.awr_diff_report_html(123456789,1,120,130,734501234,1,10,20));

The first three parameters are DBID, start snap id, end snap id; all from the first database (source/capture) and the last three the same respectively for the second database where replay took place. Creating these reports work pretty well in SQL Developer, use Ctrl-A and Ctrl-C to copy the html code and paste it into an editor and save it. You can actually create more reports than I have shown here, see the documentation for the mentioned packages to get an idea. AWR data is automatically exported to the same directory after a replay. This means you can later be import them into another incarnation of the database to compare different replays.

Step 6, flashback database is pretty easy. Shutdown the database, start it in mount, execute:

flashback database to restore point b4_replay;

Then open it with:

alter database open resetlogs;

If you are on RAC you have to shutdown the other nodes and start them after the previous command is completed.

There used to be some bugs that caused the database to crash spectacularly with an ORA-600 during flashback leaving the database useless and forcing you to do a new restore/duplicate from backup. Since I’m stressed during a project like this, and therefore I don’t have time to waste, I tend to avoid structural changes (like moving tablespaces around after creation of the restore point) that can challenge the flashback. I am not sure if the bugs are still there, though.

Step 7, are you done yet? I usually have to run the replay over and over several times trying out with different parameters for procedure dbms_workload_replay.prepare_replay(...). This takes a lot of time, don’t underestimate the project. Also remember that people will question your results.

In conclusion, try to figure out before you start what you want to prove. Are you comparing two systems, you need to agree on a common set of parameters; or are you trying to prove that a system is good enough (without comparing it to the old?) Try to lower you ambitions or scope, the time it takes with restore, replay, generating reports, looking over them, and possibly analyze further with extended trace, starting over,... it may feel like a walk in the desert sometimes.