Saturday, March 6, 2021

Snipping the Alert Log and Working With Unix

 For over 20 years as an Oracle DBA, I have worked with multiple versions of Windows as my backserver platform, but primarily with several versions of Unix (Solaris, HP/UX, AIX, and Digital Unix) and Linux. I have written and/or maintained hundreds of Unix shell scripts, including Bourne, Korn (my early preference), C shell, and bash. I'm so familiar with scripting that I routinely use on my Windows PC's at home; I licensed a copy of MKS Toolkit years back and eventually transitioned to Cygwin and/or the Windows 10 Subsystem for Linux. It's difficult to explain all the ways it can be useful, but for instance I might want to timestamp filenames to ensure uniqueness and facilitate archival, search and retrieval. I can grep textfiles, use the powerful, flexible vi editor to edit, or utilize tools like sed and awk.

An exhaustive review of the practical issues I've faced with Unix and Oracle in professional life is beyond the scope of this post. It seems every day spawns some practical issue. Let me give a simple example. In a traditional Oracle auditing, plaintext audit files of SYSDBA actions are accumulated in an audit destination directory.  Security guidelines may require encryption of these files; they may also require these be maintained for at least a year. The first time I tried to list all the audit files for encryption (e.g., *.aud), I got a "too many files" message. There are some workarounds; e.g., ls | while read fn; do ls -l $fn; done or find . -exec ls -l {} \; .[The encryption logic is separate but think of operating within the do loop or exec logic. For example, you can filter audit filetypes and/or reorganize those files.].

One will often encounter nuances. A recent example was seen in my 2 recent posts on generating randomized Oracle passwords from shell script where the first script could run in Linux and Cygwin, but not in Solaris at work.[Yes, I know you can do similar things using Oracle functionality. I might cover that in a future post.]  One recent example dealt with a simple concept of working with the date function. What's the motivation?  I am aware there are tools like Oracle Enterprise Management Cloud Control which can be configured to alert DBAs of exceptional conditions. But there might be reasons why OEM isn't installed at a client location and/or you need to run some kind of custom alert, often based on Oracle warnings or errors, appearing in the database alert log.

You might think DBAs check alert logs, but let me give an example from personal experience. While I was living in Santa Clara, I worked for a consulting company which had been engaged by a television station in San Francisco. They were running Oracle Applications/E-Business Suite, one of my specialties. Certain email notifications from the applications weren't happening, and they couldn't figure out why. They had engaged consultants from Redwood City (global home for Oracle, although they recently announced they were relocating corporate headquarters to Texas) who hadn't resolved the issue. So, of course, the first thing I check is the database alert log--and I immediately noticed that a certain tablespace had been trying to extend literally for over 4 months, including the Redwood City engagement. (I would also assume they got nowhere with Oracle Support.) I immediately added or extended a relevant datafile and long story short, it resolved their functionality issue.

Now really I might not need the whole alert log (which could go back several months). I want to snip off the current day's entries (if any) from alert log. (I can then grep the snipped daily file for warning or error patterns; I'm not including that in this post.)

Note that in my snippet logic below, the timestamp format can differ, even among two Oracle servers at work, so you might need to tweak the date format.

But a simple question motivated the post: what if you want to snippet more than today's entries? Or maybe you want to track any entries since close of business yesterday. One solution, which I won't describe in detail here, is to offset SYSDATE and transform the results via TO_CHAR and a relevant date mask, capture the result in an environmental variable or work file (say, turning off headings, pagesize, feedback, tweaking linesize, etc.)

Consider the  following excerpt from a Cygwin session (for common format strings, see here):

 ~
$ date
Sat Mar  6 14:42:06 EST 2021 ~
$ date -d "-18 hours"
Fri Mar  5 20:42:41 EST 2021
$ date -d "yesterday"
Fri Mar  5 14:43:01 EST 2021
~
$ date -d "-3 days"
Wed Mar  3 14:43:35 EST 2021

$  date -d "-3 days" +'%m-%d-%Y %H:%M'
03-03-2021 14:59

But what if you don't have a -d option like my Solaris system? You can use the TZ environmental variable, e.g., for yesterday,

 ~
$ TZ=$TZ+29 date
Fri Mar  5 15:03:24 America_New_York 2021

Note 29=24 hours + offset from GMT. For EST, the offset is 5 hours (4 for EDT)

Okay, for the (old copied) timestamp on my Solaris system in the alert log for format purposes:

Wed Mar 03 00:40:29 2021

So here's my bash script excerpt:

#!/bin/bash
export SCR_HOME=.../scripts
export ALERT=/oracle/app/diag/... /alert_sid.log
STR=`date +'%a %b %d.*%Y'`
export START=`grep -in "$STR" $ALERT| head -1 | cut -f1 -d:`
export ARG=$START,\$p
sed -n $ARG < $ALERT > $SCR_HOME/ndaily

where ndaily is the current day's alert entries