I've made my living as an Oracle DBA since 1993. I've often had auxiliary responsibilities, e.g., development, tech lead, ERP administration and system administration, but my primary responsibility has been as a DBA. Did I grow up with the ambition of being a DBA? No. I grew up in the era of mainframe computing. In fact, I had to use card punch machines to write my first computer programs (FORTRAN) in college; My college didn't have its own computer, so I had to submit my card decks to a sister campus in San Antonio. (There was a 3-day turnaround, so I could only afford to make one mistake for the weekly assignment.)
Originally I thought I had a vocation for the Roman Catholic priesthood, which is why I attended OLL, a private college. (I was an Air Force brat, and my family at the time lived at a military installation about 150 miles south of San Antonio. I didn't have a car or a license.) I didn't necessarily want to be a diocesan priest, like my maternal uncle. I had done well in school and naturally thought about teaching as a career. I did think of joining a teaching order of the Church, e.g., the Jesuits or the Oblates. In fact, I had a preliminary interview with the Jesuits at my college dorm; I don't think they ever followed up. Originally, I had intended to major in secondary education and probably would be a math/science high school teacher in Texas, except I fought my academic adviser to get into a philosophy class. (My adviser wanted me to take art history; but philosophy is a common major for prospective seminarians.) That course changed my life; philosophy remains my one true love. I transitioned to a double major (in math and philosophy). But philosophy is not a core reference discipline for secondary education. More to the point I had progressed from wanting to be a teacher to becoming a professor. There was the one experience that cemented the idea in my mind; I went to visit one of my principal philosophy professors, an Oblate priest, in his office. I walked into a room where he was probably playing his favorite, Leonard Bernstein, on a record player, pungent pipe smoke filled the room, walls crowded with books: making a living talking about deep thoughts sounded like heaven on earth.
Unfortunately, beyond teaching, there aren't many professional opportunities for philosophers, and job opportunities for philosophy PhD's were extremely limited and competitive--basically one had to hope for existing tenured faculty to resign or retire. (A number of people had frank conversations with me about the marketability of my philosophy degree.) So I shifted to consider becoming a math professor, In fact, one of the RC sisters/nuns who had earned her degree at a Louisiana university had lobbied to get me considered for its PhD program; I was verbally promised a graduate student financial aid package, but to be honest, I had my sights set on a more prestigious program at the University of Texas at Austin (where my academic adviser had earned her PhD). Another case of woulda, coulda, shoulda. I almost instantly got into political trouble as a teaching assistant to a visiting East European professor from Ohio State. (The students were at a point of rebellion; he was difficult to follow and had unorthodox tests. As a teaching assistant, I had 2 problem solving class session classes, and the students were basically unhappy with me as a surrogate to the professor, feeling maybe I wasn't doing enough to prepare them for his exams; to me, the definitive moment was when a coed, the best student in class, literally threw her second exam in my face as she left the room. (It's not like I knew what he was going to ask on the exam.) Another classic moment: one day he starts writing on a series of blackboards in the front and then on the right side of class. The students were just mechanically just jotting down everything he was writing; he wasn't really lecturing; he was just filling up the boards without comment. Suddenly, about 8 boards into his writing, he seemed to get lost in whatever he was trying to do; he looked up at me and sheepishly asked if I knew where he was going with this. (I didn't.)
The other TA set me up, saying we had to have an intervention. We agreed I would start. The professor was troubled by what I had to say and asked Jeff for his comment. Jeff turned on me, effectively saying, "Yeah, a few students are complaining, but you get some of those malcontents in any class." What the hell? I had no idea why Jeff had set me up like that, but the professor decided that I was the one with an evil agenda and became obsessed with me. I'll never forgot how he followed me into my graduate real analysis class and started bitching about me to the professor. It then got even worse; I got called to the department chair's office where I got yelled at and threatened over my "disloyal" actions. I think I knew at that moment I would never get my terminal degree from UT and would have to settle for my MA.
There were other considerations, too. It turns out there was a glut of mathematicians on the market and newly minted PhD's, if they could find a job, would have to settle for some obscure state branch university. So they decided to raise doctoral qualification to 6 exams with limited retries and a time table. I had befriended a married couple in the department; she got through her 6, but her husband was dropped from the program lacking the final sixth exam. (I never really sat for the exams, but they had become a moot point: during the spring semester it turned out I had lost my teaching assistant position for the next year, and I really couldn't afford to go to school. I got a token grader job in my last semester working on my Master's thesis, but I had to ask my maternal grandfather for a $500 loan so I could have it professional typed to department standards. (This was in the days of typing, and you needed specialized equipment to print mathematical equations.)
My job prospects were limited. I originally thought that the Air Force was going to offer me a commission and have me trained as a meteorologist, but I was passed over in the selection pools. There were no local community college instructor jobs. I tried to check into trying to pick up high school teaching credentials, but I didn't have the resources.
Now most math majors at the time almost inevitably minored in computer science, but UT had an excellent philosophy program, and it was a no-brainer at the time. Now it was a lost opportunity, as I found out a pure (vs. applied) math degree wasn't that marketable. Long story short, I eventually got a computer programming trainee position at a prominent insurance company in San Antonio. They were having problems training other programmers (say, COBOL) to write in APL, a mathematically notated interactive language that IBM had developed for rapid prototyping; APL was often used in areas like property actuary at the insurance company. There's another story behind that experience; let's just say I taught myself APL well enough to attract a job offer with a substantial increase in salary the next year from the largest APL timesharing vendor in Houston. (The timesharing business model was to sell premium-priced computer time running customized enhanced APL applications; this was a time when buying another mainframe meant big bucks, plus it was a workaround to in-house application backlogs.) The introduction of the PC was a disruptive technology; by the time I earned my MIS PhD in the mid-80's, the companies I had worked for had all gone out of business.
After about a year of living in Houston, I started looking at picking up an MBA at the University of Houston, which had a part-time nighttime program which I could fit into my work schedule. (I also hoped that it might improve my dating prospects; in fact, I did date a few coeds but nothing serious.) By my second year I was aware that the business school had a PhD program in MIS, which reopened my deferred dream of becoming a professor, was admitted and eventually got a part-time teaching stipend (2 classes a semester), enabling me to study full-time.
One of the attractions of the discipline was MIS classes were "hot"; I used to joke that we could offer "Basketweaving for MIS" and hit a class limit of 35 in 10 minutes of registration. Business schools desperately wanted newly-minted PhDs; granted, the land grant name schools were highly competitive (our first 2 graduates went to Arizona State and Iowa State), but when I first hit the academic market in the mid-1980's at conferences associated with two competitive doctoral consortia, I literally did not have to dial a phone; I fully booked every 30-minute slot available and probably could have booked at least 2-3 more days of interviews. That doesn't mean I got dozens of campus visit offers (prerequisite to a job offer; it usually involves a research presentation and several potential colleague/administrator interviews. I usually got 2-4 campus visits while I was on the market and 0-2 offers.)
I had started my MIS doctoral coursework by the time I had finally finished up the MBA in 1983; I had taken two database classes under Richard Scamell, easily one of the best professors I've ever had (and my future dissertation chair). Scamell had a contact at Oracle; this was around the time of Oracle 2, when documentation came in three-ring binders and SQL*PLUS was called "UFI" (user friendly interface). UH was running Oracle on an IBM-compatible mainframe. What I particularly remember is that we had a hard time doing our database projects due to lousy uptime; in fact, the entire class had to take incompletes that fall semester, I think that's the first time I had really encountered DBA issues or was familiar with that role. I don't think at the time I knew it would become a future career; recall, I was convinced I would a professor until retirement. Still, of all the roles I would deal with as a professor (systems analyst, etc.), a DBA role had the most appeal.
My dissertation was not on database, but my research interests included human-computer interaction, including the use of SQL, the de facto standard query language behind relational databases, like Oracle, DB2 and SQL Server. I also had a focus on documentation. One of my office mates back at UH, Minnie Yen, did look at query languages in her dissertation, so around the turn of the 1990's as my academic career began to wind down, I had proposed a joint project, my writing a "minimal manual" on learning SQL for a research project including one or more of Minnie's students.
I didn't get a chance to teach database during my 3 years at UWM, because 2 of my tenured colleagues who did more technical database research "owned" the courses (although I seem to recall they were using popular PC data applications, not SQL. I think while I was there Oracle had decided to roll out a PC bundle for Oracle 5 for roughly $200 (not that bad considering you got a huge stack of paper documentation with the bundle), and I bought a copy; however, I don't think Oracle was installed at any of the 3 universities I taught at as a professor.
I finally got a chance to teach database (and, separately, data structures) at UTEP; we didn't have Oracle, but I was determined for my students to have exposure to SQL--and used an option similar to when I taught structured 1985-standard COBOL using textbook bundled Ryan-McFarland software. In this case, the textbook I chose came bundled with a limited version of XDB-SQL.
UTEP had hired me principally because they were up for AACSB accreditation and wanted my research vita/resume at file (I'm not saying that I was a deciding factor, but I would be a second published MIS researcher in the department), and I soon discovered that the department chair (it was a fusion department, including production logistics faculty) quickly reneged on key promises and probably the most obnoxious department chair I've ever met. I'm not going to go into specifics here other than my stay in El Paso was a hellish experience, so much so I went on a campus visit in Louisiana during finals week of that first fall semester. (It was not my idea; I wanted to defer it to the spring. For some reason, they refused And then they decided not to make an offer, which I would have accepted.) By the spring, it was clear that UTEP wasn't that thrilled with me, either. My options were limited by that time because most of the colleges had done their screenings at the late fall conferences. It turned out that Illinois State had a temporary 1-year appointment available because one of their applied computer science professors was trying to set up a research facility.
As in the other two universities, there were some nasty office politics at play. I went on the market, but I found that the market had drastically changed. There were probably 3 reasons: (1) to help the regional schools unable to hire new MIS PhDs, Indiana University offered a summer program to retrain other professors (e.g., PLM, QMS) to MIS, (2) there was a bumper crop of new PhD's; for example, I had leapfrogged 12 ABD's to graduate #4 (#3 defended her dissertation one week before me); almost all of these went through as UH warned them they might need to re-qualify as their 5-year clock ended, plus others I had studied with joined the market; (3) MIS class enrollments started to cool off. I had noticed this at UWM, where classes of 35 the next year had available slots, like 28 enrolled. Also, there was a recession, and even computer science graduates, who normally didn't want anything to do with business schools, were submitting their vitae.
The market had radically shifted; schools that normally got a dozen resumes and hoped their top 3 would be available suddenly got flooded with 85 resumes, so then they quickly had to adapt. It depended on the school but they developed filters which didn't necessarily favor me, despite my research and teaching record, not being an ABD. Maybe, for instance, a male-dominated department might want to recruit a female candidate (I know one school that did this), there was a demand for computer networking faculty, etc.
I think I did get 2-3 campus visits, but no offers. Ironically, Illinois State was in the market, mostly to cover the service course (I was working for a hybrid Applied Computer Science department). Initially they wanted to talk to me, but I expect that some campus politics came into play, and they refused to return my calls.
I didn't realize it at the time but my academic career was over. I did get an offer in 1994, but the offer came several weeks after my visit, and I had just started a new job; a lot of professional recruiters were worried that I wanted to use them until the academic market recovered. I was between jobs when I went on the campus visit, and I made it in good faith. They had led me to believe that I would get a status in 2 weeks and that didn't happen. (Of course, it's possible that they ran into bureaucratic inertia) It certainly wasn't the money; they were offering the highest figure I ever got from a school and more (by over $10,000) than I was making in my new job. This would be the last time I got a serious offer. I came close to joining the faculty of a Catholic college in the DC metro area a few years later. That, ironically, was probably my biggest disappointment; I have a BA from a Catholic college; I went on campus visits to 4 Catholic colleges and no offers. It wasn't money; for example, Providence College made it clear the max they would pay is $35K, far lower than my state-sponsored offers.
So looking for work, I had to look back at returning as an IT professional. (An industry research position was not an option; for example, IBM at that time was struggling.) As mentioned above, a lot of IT recruiters saw me as trying to use the company until an academic offer came along; it didn't matter what I said. Others discounted academic computing experience and considered my prior job skills as probably eroded. Still others refused to consider me for anything beyond entry-level work.
I finally found some APL work at an IBM subsidiary in the Dallas area, but I was really focusing on reinventing myself as an Oracle DBA. The problem was chicken or the egg in terms of experience. Finally, an EPA contractor was willing to offer me a position at the Chicago regional headquarters. They paid me peanuts, but I got my foot in the door. On literally my second day on the job, they lost their recompete on the contract (which basically meant I would be looking for a new job in a few months). I got my first extended gig at a private company which is now an Equifax subsidiary. I spent 2.5 years there (and would probably still be there if the owners didn't sell). [The CEO worried that Accenture was a threat to their business model.] About a year into the job, I got unsolicited calls from a Coopers & Lybrand recruiter helping set up an Oracle Apps/EBS practice. I essentially transitioned from an in-house DBA to a consultant by 1997 and have mostly served in a contractor role since 2004. I may expand on my experience in future posts, although my first blog post features some insights from my specialty as an Oracle Apps DBA.
This blog may contain, but is not restricted to, Oracle database technical notes, tips, tutorials, or reflections on professional experiences.
Monday, May 1, 2017
Friday, April 28, 2017
Tip: You Should Consider Purging the DBA_Recyclebin
The Oracle Recyclebin feature introduced with version 10G is somewhat analogous to deletions from Windows recycle bin functionality. Essentially Microsoft obfuscates the location of the file and also marks its storage as available for other use by the system. There are third-party tools which may allow you to recover them, although the probability of recovery declines over time as the system has need for available space for other purposes. In a similar way, when an authorized Oracle user drops objects like tables and dependent objects, like indexes and constraints, they aren't necessarily removed from storage (assuming a default recyclebin parameter ON/TRUE); they are essentially renamed with a BIN$ prefix. You can identify the current OBJECT_NAME from DBA_RECYCLEBIN using OWNER, ORIGINAL_NAME; you can recover by FLASHBACK TABLE owner.original_name TO BEFORE DROP. (Note that there are some nuances and exceptions to recycle bin functionality (e.g., recovering dropped schemas or objects of the SYS user) and limited recovery options.) Also note that the ability to recover dropped objects lessen over time as space is needed for other purposes (e.g., inserts into same tablespace tables or user space quotas).
Oddly enough, the recycle bin issue came into play in an indirect way. To provide a context, the virtual container recycle bin will grow over time. You can control this in various ways, e.g., turning the initialization parameter off or executing the purge command.
In my current position, we often do selective schema exports (in addition to regularly scheduled exports or database backups) on request for development. These exports are scripted to the extent that fellow privileged administrators with limited Oracle background can run them. We also administer multiple databases that are similarly designed and structured, and the export task is repeated across database servers.
The wall time for the 2-schema export task run manually is for most servers within 15 minutes [Note that in this context, we are discussing data pump, i.e., expdmp, not exp.]. One day a system administrator ran the ad hoc task request on a third database instance when the export of the second schema seemed to hang for literally 3.5 hours. I analyzed the alert log; I tried export alternatives (if I excluded statistics and the table filter in the export parfile, I could do the export in less than 15 minutes). Development wasn't happy with the delays and pointed out no other DBA's were reporting similar issues with their servers; it must be local administrator "incompetence". It wasn't obvious why this server was behaving differently from the others; they had similar configurations, applied patches, etc.
I currently don't access to Metalink/My Oracle Support; I tried to search on abnormally long exports or statistics using Internet search. (Why did I pursue that? Because it seemed to be hanging on the initialization stage referencing statistics: ") Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS") But most of the problems I found focusing on statistics were on the import, not the export stage.
A DBA colleague had stumbled across a discussion of dba_recyclebin. I don't know which source he found, but I have found this discussion on data migration using data pump which advises (without explanation) purging the DBA_recyclebin before running expdp.
What we found is literally thousands of objects in the recyclebin, the vast majority of which existed in the problem schema. My colleague ran purge dba_recyclebin from the internal account as sysdba, and the proof is in the pudding: the subsequent wall time for the problem schema export shrank to about 3 minutes.
It's still not clear why the dropped object problem was peculiar to this database instance given standard installation, patching and scripted application functions. But if you don't disable the recyclebin default, at least create and schedule periodically a purge table script from DBA_RECYCLEBIN (e.g., using droptime filter). Chances are, you can't count on being able to recover long-dropped objects anyway.
Oddly enough, the recycle bin issue came into play in an indirect way. To provide a context, the virtual container recycle bin will grow over time. You can control this in various ways, e.g., turning the initialization parameter off or executing the purge command.
In my current position, we often do selective schema exports (in addition to regularly scheduled exports or database backups) on request for development. These exports are scripted to the extent that fellow privileged administrators with limited Oracle background can run them. We also administer multiple databases that are similarly designed and structured, and the export task is repeated across database servers.
The wall time for the 2-schema export task run manually is for most servers within 15 minutes [Note that in this context, we are discussing data pump, i.e., expdmp, not exp.]. One day a system administrator ran the ad hoc task request on a third database instance when the export of the second schema seemed to hang for literally 3.5 hours. I analyzed the alert log; I tried export alternatives (if I excluded statistics and the table filter in the export parfile, I could do the export in less than 15 minutes). Development wasn't happy with the delays and pointed out no other DBA's were reporting similar issues with their servers; it must be local administrator "incompetence". It wasn't obvious why this server was behaving differently from the others; they had similar configurations, applied patches, etc.
I currently don't access to Metalink/My Oracle Support; I tried to search on abnormally long exports or statistics using Internet search. (Why did I pursue that? Because it seemed to be hanging on the initialization stage referencing statistics: ") Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS") But most of the problems I found focusing on statistics were on the import, not the export stage.
A DBA colleague had stumbled across a discussion of dba_recyclebin. I don't know which source he found, but I have found this discussion on data migration using data pump which advises (without explanation) purging the DBA_recyclebin before running expdp.
What we found is literally thousands of objects in the recyclebin, the vast majority of which existed in the problem schema. My colleague ran purge dba_recyclebin from the internal account as sysdba, and the proof is in the pudding: the subsequent wall time for the problem schema export shrank to about 3 minutes.
It's still not clear why the dropped object problem was peculiar to this database instance given standard installation, patching and scripted application functions. But if you don't disable the recyclebin default, at least create and schedule periodically a purge table script from DBA_RECYCLEBIN (e.g., using droptime filter). Chances are, you can't count on being able to recover long-dropped objects anyway.
Friday, March 11, 2016
Tip: A Counterintuitive RMAN Extraction of the Controlfile
On one of my last gigs, my DBA colleague and I needed to restore an R12.2 EBS backup to two VMs (using ESXi 5.5) built on RHEL 6.6. Typically one node serves as the database tier and the other as the Apps tier. The source servers were not available in our network; we accessed backups stored on a hard drive plugged into a docking station attached to an Ubuntu file server. There was an RMAN backup stored under a relevant folder. We then scp'ed the RMAN folder onto our target database VM. The source database version was Enterprise Edition 11G R2.
It is not my intention here to provide here a tutorial on the process of restoring datafiles to target mountpoints on your target database server. The key step is locating/extracting the controlfile from its backup piece. (The basic idea is once you locate the controlfile, you need to edit stored file mountpoints in the controlfile via RMAN or SQL*Plus for the directories on your target database server before you catalog/restore from the backup pieces.)
We had a copy of the RMAN log file, and there was an initial backup piece which seemed to have some variation of CONTROL embedded in its name, but no matter how we tried to extract the controlfile, e.g., restore controlfile from '[pathed backup piece file name]', the operation failed. Did we have a corrupt copy of the backup piece? Yet another DBA who thought he had used the same backup for an alternate restore insisted the controlfile was in that first piece.
We tried looking at My Oracle Support Knowledge Base and Googled for something relevant. But the real issue was that we had to look outside the box, i.e., the idea that the controlfile was in that first backup piece.
I don't have the exact post I stumbled across Googling that evening after coming home from work, but the gist was, if you don't find the controlfile in the first backup piece, try the extraction against each remaining backup piece and you'll eventually stumble across the backup piece with the controlfile. I intuitively knew this must be true.
The next morning I had my colleague read the post in question, and he manually retyped the command substituting the backup piece file name in sequence, finally getting an extraction near the end of the set of around 23 files.
I'm not sure I can generalize from one case, but in this case we had the first n-1 backup pieces the same size and the controlfile was in an odd-sized/smaller backup piece.
It is not my intention here to provide here a tutorial on the process of restoring datafiles to target mountpoints on your target database server. The key step is locating/extracting the controlfile from its backup piece. (The basic idea is once you locate the controlfile, you need to edit stored file mountpoints in the controlfile via RMAN or SQL*Plus for the directories on your target database server before you catalog/restore from the backup pieces.)
We had a copy of the RMAN log file, and there was an initial backup piece which seemed to have some variation of CONTROL embedded in its name, but no matter how we tried to extract the controlfile, e.g., restore controlfile from '[pathed backup piece file name]', the operation failed. Did we have a corrupt copy of the backup piece? Yet another DBA who thought he had used the same backup for an alternate restore insisted the controlfile was in that first piece.
We tried looking at My Oracle Support Knowledge Base and Googled for something relevant. But the real issue was that we had to look outside the box, i.e., the idea that the controlfile was in that first backup piece.
I don't have the exact post I stumbled across Googling that evening after coming home from work, but the gist was, if you don't find the controlfile in the first backup piece, try the extraction against each remaining backup piece and you'll eventually stumble across the backup piece with the controlfile. I intuitively knew this must be true.
The next morning I had my colleague read the post in question, and he manually retyped the command substituting the backup piece file name in sequence, finally getting an extraction near the end of the set of around 23 files.
I'm not sure I can generalize from one case, but in this case we had the first n-1 backup pieces the same size and the controlfile was in an odd-sized/smaller backup piece.
Thursday, March 10, 2016
If You Are a First-Time ERP DBA
There are some basic concepts which you may not know and which may seem to be common sense, but a lot of general (vs. ERP) Oracle DBAs don't know them. I'm going to illustrate these points with real life horror stories from my professional experience. I am not arguing this list is exhaustive; I may do follow-up posts.
An ERP database is different and more complex; it is designed to use with an enterprise-wide integrated software system which may include components from different functional areas in the company, like accounting, manufacturing, and human resources. I have been at customer sites which may employ dozens of DBAs, but only have 1 or 2 ERP DBAs. I've seen some companies separate ERP DBAs into separate roles: the software administrator and the physical DBA role; the former may do software configurations, patching, and application administrative duties, and the physical DBA handles the database administration operation and maintenance, backups, etc.
An ERP database is different and more complex; it is designed to use with an enterprise-wide integrated software system which may include components from different functional areas in the company, like accounting, manufacturing, and human resources. I have been at customer sites which may employ dozens of DBAs, but only have 1 or 2 ERP DBAs. I've seen some companies separate ERP DBAs into separate roles: the software administrator and the physical DBA role; the former may do software configurations, patching, and application administrative duties, and the physical DBA handles the database administration operation and maintenance, backups, etc.
- Read the Documentation
This may seem to be an obvious point, but let me provide an explanation. I will also draw on this scenario in a follow-up discussion on platform differences. I've had a lot of experience with Oracle's ERP Suite, traditionally called Oracle Applications (Financials, Manufacturing, etc.) or more recently referenced as e-Business Suite.
I was an hourly subcontractor to an outsourcing company doing operations work for the park district in a large Midwestern city. My Apps DBA experience has ranged from 10.7 including a client-server architecture to more web-oriented 11.0.x to 11i to current R12. The city had an aggressive 3-month upgrade schedule to go from 11.0.x to 11i. One of the reasons I had been selected for the role was because I had experience with both versions. The city had awarded the project to a different consulting unit of the vendor. (The project had a tight deadline because the city wanted it done 6 weeks before the end of the calendar for end-of-year processing. What the client didn't know was that the vendor was not assigning full-time employees to staff the projects but subcontractors like me. The vendor project manager did not ask for my input in hiring the 3 or so project DBAs. I was not asked to be on the project in part because of the different funding sources.)
The platform was Windows NT; there are some nuances in running Oracle on Windows. For purposes of the present discussion it suffices to know that the Windows registry presents issues. Additional background: the upgrade process was guided by the installation of the RapidWiz utility. The stages of the upgrade are called categories. The upgrade usually includes a higher version both of the ERP software and the database. Now Oracle also allows a way for customers to upgrade their database separately from the application software, but you generally have to run a series of interoperability patches on the application layer to accommodate a higher version database. (We'll return to this point later in the post.)
In a broader view, the upgrade includes a series of steps to prepare the database for the upgrade. You then shut down the database and upgrade the database if necessary (see above discussion). You then run the Apps driver patch and then complete post-upgrade activities. Most of what RapidWiz does is install software for the new database and software version. It's not really needed for the closeout steps for the database prior to upgrade--with one major exception: some of the scripts used for the closeout steps are installed with RapidWiz. (If I had been Oracle, I would have bundled these separately.)
Now the Apps upgrade manual said in boldface type and centered in a centered paragraph: do NOT run this on a Windows server which already has Oracle server software installed. I had given the upgrade team a production database clone with Oracle database server software installed and properly configured. Now Oracle didn't spell out the reasons why you should avoid running RapidWiz on the server, but it's clear from context if you understand the implications of the Windows registry. For example, I believe 11.0.3 had RDBMS 8.0.x installed and RapidWiz installed version 8i. If you try to bring up a lower-version database with higher-version database software, the software is going to balk that you need to upgrade the database first.
Now the project team DBAs really didn't talk to me about their activities other than in an all-hands morning staff meeting. I'm in the server room working on the production server when I heard the project DBAs in a general state of confusion. They explained they had just bounced the server and the database and listener wouldn't come up. The lead guy CB was a piece of work; I quickly pointed out they were trying to run on 8i executables and asked them if they had run RapidWiz. They conceded the obvious. They didn't have a clue on how to proceed; I went and made a number of changes to the registry to point back to version 8 database server locations. (CB, of course, protested that Microsoft and/or Oracle doesn't support hacking the registry. Of course, Oracle didn't support what he did causing the problem.)
I later tried to educate CB on the meaning of the warning. I said, "Look, I can install RapidWiz on a fresh server and network-share the scripts directory so you can finish up your closeout activities." And CB is demanding to know where Oracle is saying to do that, as if I have control over Oracle software deployment or documentation; even if Oracle didn't flesh a solution, it was fairly obvious given the context. For example, you have a new database server with 11i installed and you migrate the production database to that server after closeout. CB creates a convoluted interpretation of the plain-word meaning of the warning which the consulting PM and my boss seem to buy into, and my boss told me I had to open up a ticket with Oracle Tech Support to support my understanding. The analyst was unsympathetic to my case ; "Didn't you tell them that violating warnings is against Technical Support agreements and we won't support future Apps issues?" I said yes, but this was politics, and I needed a piece of paper telling them that Oracle meant what was said in that boldface warning.
I was an hourly subcontractor to an outsourcing company doing operations work for the park district in a large Midwestern city. My Apps DBA experience has ranged from 10.7 including a client-server architecture to more web-oriented 11.0.x to 11i to current R12. The city had an aggressive 3-month upgrade schedule to go from 11.0.x to 11i. One of the reasons I had been selected for the role was because I had experience with both versions. The city had awarded the project to a different consulting unit of the vendor. (The project had a tight deadline because the city wanted it done 6 weeks before the end of the calendar for end-of-year processing. What the client didn't know was that the vendor was not assigning full-time employees to staff the projects but subcontractors like me. The vendor project manager did not ask for my input in hiring the 3 or so project DBAs. I was not asked to be on the project in part because of the different funding sources.)
The platform was Windows NT; there are some nuances in running Oracle on Windows. For purposes of the present discussion it suffices to know that the Windows registry presents issues. Additional background: the upgrade process was guided by the installation of the RapidWiz utility. The stages of the upgrade are called categories. The upgrade usually includes a higher version both of the ERP software and the database. Now Oracle also allows a way for customers to upgrade their database separately from the application software, but you generally have to run a series of interoperability patches on the application layer to accommodate a higher version database. (We'll return to this point later in the post.)
In a broader view, the upgrade includes a series of steps to prepare the database for the upgrade. You then shut down the database and upgrade the database if necessary (see above discussion). You then run the Apps driver patch and then complete post-upgrade activities. Most of what RapidWiz does is install software for the new database and software version. It's not really needed for the closeout steps for the database prior to upgrade--with one major exception: some of the scripts used for the closeout steps are installed with RapidWiz. (If I had been Oracle, I would have bundled these separately.)
Now the Apps upgrade manual said in boldface type and centered in a centered paragraph: do NOT run this on a Windows server which already has Oracle server software installed. I had given the upgrade team a production database clone with Oracle database server software installed and properly configured. Now Oracle didn't spell out the reasons why you should avoid running RapidWiz on the server, but it's clear from context if you understand the implications of the Windows registry. For example, I believe 11.0.3 had RDBMS 8.0.x installed and RapidWiz installed version 8i. If you try to bring up a lower-version database with higher-version database software, the software is going to balk that you need to upgrade the database first.
Now the project team DBAs really didn't talk to me about their activities other than in an all-hands morning staff meeting. I'm in the server room working on the production server when I heard the project DBAs in a general state of confusion. They explained they had just bounced the server and the database and listener wouldn't come up. The lead guy CB was a piece of work; I quickly pointed out they were trying to run on 8i executables and asked them if they had run RapidWiz. They conceded the obvious. They didn't have a clue on how to proceed; I went and made a number of changes to the registry to point back to version 8 database server locations. (CB, of course, protested that Microsoft and/or Oracle doesn't support hacking the registry. Of course, Oracle didn't support what he did causing the problem.)
I later tried to educate CB on the meaning of the warning. I said, "Look, I can install RapidWiz on a fresh server and network-share the scripts directory so you can finish up your closeout activities." And CB is demanding to know where Oracle is saying to do that, as if I have control over Oracle software deployment or documentation; even if Oracle didn't flesh a solution, it was fairly obvious given the context. For example, you have a new database server with 11i installed and you migrate the production database to that server after closeout. CB creates a convoluted interpretation of the plain-word meaning of the warning which the consulting PM and my boss seem to buy into, and my boss told me I had to open up a ticket with Oracle Tech Support to support my understanding. The analyst was unsympathetic to my case ; "Didn't you tell them that violating warnings is against Technical Support agreements and we won't support future Apps issues?" I said yes, but this was politics, and I needed a piece of paper telling them that Oracle meant what was said in that boldface warning.
- ERP Databases Can Have Different Requirements
Take one example to make the point. The default character set for databases in the US distribution is US7ASCII. DBAs should be generally aware that this parameter can be modified only at the time the DBA creates the database. The traditional remedy for a misspecified character set is to create a new database with the non-default character set and do an export/import. If you are talking about databases several gigabytes or terabytes in size, this can be nontrivial.
The scenario for this example involved the US subsidiary of a European company which operates a chain of airport stores; the subsidiary is located in a Baltimore suburb, and my boss was okay with me commuting to projects outside of the Chicago area where I lived at the time. My new DC area consulting company employer had been targeting me for an Oracle Apps project at a major tobacco products company headquartered in NYC. The project had been delayed when a female company DBA who had gotten into an argument with the client female VP was walked off the project in just her second week. So I was tasked to replace her. It was just a couple of weeks before Christmas, and I was worried about the political baggage I would inherit. You want to have your lead off project a success story; I know stories of DBAs literally walked out of a job on their second day (not me personally). On the plus side, I got to drive a Mercedes to work. Not my idea, by the way. My boss had hired a tech manager who had made lease of a Mercedes a condition of employment. The manager had been caught by the boss running his own business interests at Oracle's annual conference. They were stuck with the lease of the Mercedes once the tech manager was terminated. I tend to be cheap when spending someone else's money; I will sometimes even take the cheapest subcompact when my boss would sign off on an SUV. In this case, it made no sense to pay for a second vehicle.
On my first day, I met with the production DBA, who wasn't a client employee but a contractor who had advanced to a DBA role from his prior work as a developer. When I first saw him, he had just kicked off the Smart Client driver patch for the test system running 10.7 Apps. The client had just licensed Oracle Human Resources in conjunction with the already installed Oracle Financials. The HR component required the GUI component of Smart Client, which was not installed in production.
I had run through this driver patch probably a dozen times without incident. When I got to his desk, the driver had just broken down. Oracle gave a message to the effect; "You've just encountered a problem. Do you wish to continue on with the patch (Y/N)?" This is a matter of common sense, but he was about to say 'Y', when I screamed at him to stop. Maybe not so common sense.
I looked at the error in question and it involved the routine compilation of a code object. I looked up the issue in Oracle Metalink/My Oracle Support's Knowledge Base. No hit. I started looking at the infrastructure; the environmental variables were correctly specified. At some point the thought crossed my mind: the environment thinks the database is running on the correct Western European character set; what does the database think its character set is? Bingo--US7ASCII. I later showed the client I could run the patch against a database running Western European character set without a problem. Two issues; first, this meant the production database had the same problem; second, how could they have been operating all this time with the wrong character set and not notice until my discovery?
At some point I became aware that the Apps database had been on a different server and was rehosted to the current AIX server by an independent, obviously non-Apps DBA. I don't know the nuances of the migration (e.g., platform implications) or whether they wanted to reorganize the database in the migration process, but rather than clone the existing database across the servers, the DBA recreated the Apps database doing an export/import. All the DBA's activities had been scripted and logged. Bingo! I fished out the CREATE DATABASE statement, which did not specify the Western European (or other) character set.
There was only one supported solution, as described above. The production DBA made it political by reproducing a Metalink note describing a hack to the character set value but even the note that mentioned it described it as a temporary measure until the official solution was implemented. The troublemaker DBA told the VP I was trying to bump up consulting fees. I told the client it didn't matter to me if they assigned the production server to the production DBA, me, or an independent consultant; what concerned me was that running an unsupported configuration violated their support contract with Oracle, which they couldn't risk in their production environment.
The VP of IT didn't let it go. One day she called me in on what she thought was a gotcha. I had to choose my words carefully. She showed me proof that Oracle supported US7ASCII. Yes, of course, Oracle supported databases using its default character set, but that didn't apply to Oracle Apps databases. I had to show her in writing in the first appendix to the Apps installation manual which explicitly pointed out the requirement of Western European. (The follow-up remedy is a separate story, including server hardware problems not reported to management.) But the point remains; the non-Apps DBA in doing the migration across servers had not done due diligence on the prerequisites for creating an Apps database, and his simple omission of character set meant his efforts had to be redone. Why the client had selected a non-Apps DBA in the first place is a separate issue
The scenario for this example involved the US subsidiary of a European company which operates a chain of airport stores; the subsidiary is located in a Baltimore suburb, and my boss was okay with me commuting to projects outside of the Chicago area where I lived at the time. My new DC area consulting company employer had been targeting me for an Oracle Apps project at a major tobacco products company headquartered in NYC. The project had been delayed when a female company DBA who had gotten into an argument with the client female VP was walked off the project in just her second week. So I was tasked to replace her. It was just a couple of weeks before Christmas, and I was worried about the political baggage I would inherit. You want to have your lead off project a success story; I know stories of DBAs literally walked out of a job on their second day (not me personally). On the plus side, I got to drive a Mercedes to work. Not my idea, by the way. My boss had hired a tech manager who had made lease of a Mercedes a condition of employment. The manager had been caught by the boss running his own business interests at Oracle's annual conference. They were stuck with the lease of the Mercedes once the tech manager was terminated. I tend to be cheap when spending someone else's money; I will sometimes even take the cheapest subcompact when my boss would sign off on an SUV. In this case, it made no sense to pay for a second vehicle.
On my first day, I met with the production DBA, who wasn't a client employee but a contractor who had advanced to a DBA role from his prior work as a developer. When I first saw him, he had just kicked off the Smart Client driver patch for the test system running 10.7 Apps. The client had just licensed Oracle Human Resources in conjunction with the already installed Oracle Financials. The HR component required the GUI component of Smart Client, which was not installed in production.
I had run through this driver patch probably a dozen times without incident. When I got to his desk, the driver had just broken down. Oracle gave a message to the effect; "You've just encountered a problem. Do you wish to continue on with the patch (Y/N)?" This is a matter of common sense, but he was about to say 'Y', when I screamed at him to stop. Maybe not so common sense.
I looked at the error in question and it involved the routine compilation of a code object. I looked up the issue in Oracle Metalink/My Oracle Support's Knowledge Base. No hit. I started looking at the infrastructure; the environmental variables were correctly specified. At some point the thought crossed my mind: the environment thinks the database is running on the correct Western European character set; what does the database think its character set is? Bingo--US7ASCII. I later showed the client I could run the patch against a database running Western European character set without a problem. Two issues; first, this meant the production database had the same problem; second, how could they have been operating all this time with the wrong character set and not notice until my discovery?
At some point I became aware that the Apps database had been on a different server and was rehosted to the current AIX server by an independent, obviously non-Apps DBA. I don't know the nuances of the migration (e.g., platform implications) or whether they wanted to reorganize the database in the migration process, but rather than clone the existing database across the servers, the DBA recreated the Apps database doing an export/import. All the DBA's activities had been scripted and logged. Bingo! I fished out the CREATE DATABASE statement, which did not specify the Western European (or other) character set.
There was only one supported solution, as described above. The production DBA made it political by reproducing a Metalink note describing a hack to the character set value but even the note that mentioned it described it as a temporary measure until the official solution was implemented. The troublemaker DBA told the VP I was trying to bump up consulting fees. I told the client it didn't matter to me if they assigned the production server to the production DBA, me, or an independent consultant; what concerned me was that running an unsupported configuration violated their support contract with Oracle, which they couldn't risk in their production environment.
The VP of IT didn't let it go. One day she called me in on what she thought was a gotcha. I had to choose my words carefully. She showed me proof that Oracle supported US7ASCII. Yes, of course, Oracle supported databases using its default character set, but that didn't apply to Oracle Apps databases. I had to show her in writing in the first appendix to the Apps installation manual which explicitly pointed out the requirement of Western European. (The follow-up remedy is a separate story, including server hardware problems not reported to management.) But the point remains; the non-Apps DBA in doing the migration across servers had not done due diligence on the prerequisites for creating an Apps database, and his simple omission of character set meant his efforts had to be redone. Why the client had selected a non-Apps DBA in the first place is a separate issue
- Be Aware That Independent Modifications or Operations on Proprietary Software Objects May Violate Vendor Support Standards
Several years back I worked initially as a subcontractor through an agency in which I call a "babysitting gig"; in this case, the client was an American marketing/support subsidiary of a large Japanese maker of chip testing machines; its clients included IBM, Intel and Micron. Almost a year earlier they had relocated their American headquarters from the northwest suburbs of Chicago to Silicon Valley (Santa Clara), near one of their largest clients. Many employees refused to relocate with the company, including the company's DBA because the company hired a local DBA (inferred from the fact he told me he had been there 7 months). It was a relatively small IT group, maybe 15 people. There was a soap opera beyond the context of this post; he had applied for a vacant IT manager position and was denied. So he had given his notice and didn't have a backup to transition to. This was mid-1999 at the peak of the Internet bubble. The client couldn't recruit a replacement, and I agreed to a 5-week contract commuting out of Chicago (it later got extended, and I ended up relocating to Santa Clara for 18 months when I agreed to go perm).
The arrangement was for V to transition me over a week, but he was ready to leave my second day. As a former senior principal consultant for Oracle Consulting, I had high standards; I soon discovered that the client was behind megapatches maybe up to 3 years or more. For example, one product was at level F and the latest megapatch was at the Q level. And it was poorly documented; for instance, they had replaced Oracle's standard check with a custom one (so if you patched Apps, it would regenerate its standard check). I lobbied the newly hired IT manager to update the Apps. This was critical because if we ran into a problem with production, the first question Oracle Support would ask is whether the Apps were current patch-level including all interim product fixes (vs a one-off patch). The users, particularly the accountants, resisted change, worrying any patching would break functionality. I applied the megapatches one weekend, skipping my commute back home.
Now the accountants were attached to "green screen" character-mode 10.7 which was analogous to Lotus 1-2-3/DOS in terms of keystrokes to get to the desired screen access; they also had a more direct connection to the database server. Oracle had desupported green screen mode at the end of 1998 in favor of its GUI approach, and the accountants were not impressed by what they saw as usability issues navigating through a series of pop ups to get to a desired screen. As someone with an active research interest in human factors in computing, I was empathetic with the accountants' complaints on Oracle force feeding an unrequested, less usable interface on them. On the other hand, I had a responsibility to the company to ensure our configuration was supported through our maintenance contract with Oracle. I could not afford for something to blow up in production and Oracle say, apply these patches and call us in the morning if the problem hasn't gone away--or have Oracle ask why an accountant is still running green screen after desupport.
I can still remember this six-foot female receivables clerk telling her boss she could get through all her invoices today if she could go green screen or part of them doing things "Ron's way"--his choice. (I only mention her height, because my boss was hinting she might like me, I don't have issues dating taller women. However, I like them more when they don't fight me doing my job.)
One particular Asian female accounting manager who was unhappy I replaced V was agitating over my "reckless patching", confident I was "screwing things up". She finally discovered an anomaly: the company's fixed assets appeared in green screen, but not in GUI. Even worse, she contacted Oracle Support directly, without going through me. At some point, they ask her to do a "row who" on a record, and she reports back "ANONYMOUS". Game! Set! Match! Oracle had tricked her unknowingly into revealing the company had violated Oracle's support agreement. Let me explain: Oracle wants you to do things through the application, and they tag records accordingly. When a record has an anonymous tag, it means the record was inserted independently of Apps processes, e.g., a SQL statement or SQL-Loader. Oracle Consulting gave my boss a project quote of $10,000 to fix the product, plus the Apps would be unavailable for 2 weeks (a non-starter because of the effects on operations).
It took me a while to piece together what had happened and how to fix the problem. When the company installed the Fixed Assets module some 3 years earlier, developers didn't want to go through the busy work hassle of setting up the subcategory "None" for each of 17 asset categories, so they targeted the target Apps table and SQL-LOADed it into the database. What they didn't know was that in working through the setup, Oracle converted the subcategory name into uppercase. The green screen interface would allow for a mixed case match. However, what the GUI software did was convert the search subcategory name into uppercase and do a search on the relevant database object. Since the subcategory names were "None", not "NONE", no assets appeared under GUI.
Devising a fix was easier said than done, because the subcategory name field was referenced in multiple objects across the database, and of course the fix itself would violate Oracle's agreement. After thoroughly testing my fix in test, I then promoted it into production after hours. I won a record 3 CEO awards during my 13 months with the company but not for this issue. Did I get a raise, bonus or award? Nope, but there is the satisfaction of knowing you solved a problem few people are capable of solving. (Why did I leave after 13 months? A long story; my boss had reneged on some promises and was taking advantage of me as an employee (70-hour workweeks at a below-market salary), and I was about to go to an in-person at a real estate portal in Austin, when my boss had a heart attack while visiting our Vermont branch office (primarily servicing IBM). I waited until he was back on his feet before resigning.)
The arrangement was for V to transition me over a week, but he was ready to leave my second day. As a former senior principal consultant for Oracle Consulting, I had high standards; I soon discovered that the client was behind megapatches maybe up to 3 years or more. For example, one product was at level F and the latest megapatch was at the Q level. And it was poorly documented; for instance, they had replaced Oracle's standard check with a custom one (so if you patched Apps, it would regenerate its standard check). I lobbied the newly hired IT manager to update the Apps. This was critical because if we ran into a problem with production, the first question Oracle Support would ask is whether the Apps were current patch-level including all interim product fixes (vs a one-off patch). The users, particularly the accountants, resisted change, worrying any patching would break functionality. I applied the megapatches one weekend, skipping my commute back home.
Now the accountants were attached to "green screen" character-mode 10.7 which was analogous to Lotus 1-2-3/DOS in terms of keystrokes to get to the desired screen access; they also had a more direct connection to the database server. Oracle had desupported green screen mode at the end of 1998 in favor of its GUI approach, and the accountants were not impressed by what they saw as usability issues navigating through a series of pop ups to get to a desired screen. As someone with an active research interest in human factors in computing, I was empathetic with the accountants' complaints on Oracle force feeding an unrequested, less usable interface on them. On the other hand, I had a responsibility to the company to ensure our configuration was supported through our maintenance contract with Oracle. I could not afford for something to blow up in production and Oracle say, apply these patches and call us in the morning if the problem hasn't gone away--or have Oracle ask why an accountant is still running green screen after desupport.
I can still remember this six-foot female receivables clerk telling her boss she could get through all her invoices today if she could go green screen or part of them doing things "Ron's way"--his choice. (I only mention her height, because my boss was hinting she might like me, I don't have issues dating taller women. However, I like them more when they don't fight me doing my job.)
One particular Asian female accounting manager who was unhappy I replaced V was agitating over my "reckless patching", confident I was "screwing things up". She finally discovered an anomaly: the company's fixed assets appeared in green screen, but not in GUI. Even worse, she contacted Oracle Support directly, without going through me. At some point, they ask her to do a "row who" on a record, and she reports back "ANONYMOUS". Game! Set! Match! Oracle had tricked her unknowingly into revealing the company had violated Oracle's support agreement. Let me explain: Oracle wants you to do things through the application, and they tag records accordingly. When a record has an anonymous tag, it means the record was inserted independently of Apps processes, e.g., a SQL statement or SQL-Loader. Oracle Consulting gave my boss a project quote of $10,000 to fix the product, plus the Apps would be unavailable for 2 weeks (a non-starter because of the effects on operations).
It took me a while to piece together what had happened and how to fix the problem. When the company installed the Fixed Assets module some 3 years earlier, developers didn't want to go through the busy work hassle of setting up the subcategory "None" for each of 17 asset categories, so they targeted the target Apps table and SQL-LOADed it into the database. What they didn't know was that in working through the setup, Oracle converted the subcategory name into uppercase. The green screen interface would allow for a mixed case match. However, what the GUI software did was convert the search subcategory name into uppercase and do a search on the relevant database object. Since the subcategory names were "None", not "NONE", no assets appeared under GUI.
Devising a fix was easier said than done, because the subcategory name field was referenced in multiple objects across the database, and of course the fix itself would violate Oracle's agreement. After thoroughly testing my fix in test, I then promoted it into production after hours. I won a record 3 CEO awards during my 13 months with the company but not for this issue. Did I get a raise, bonus or award? Nope, but there is the satisfaction of knowing you solved a problem few people are capable of solving. (Why did I leave after 13 months? A long story; my boss had reneged on some promises and was taking advantage of me as an employee (70-hour workweeks at a below-market salary), and I was about to go to an in-person at a real estate portal in Austin, when my boss had a heart attack while visiting our Vermont branch office (primarily servicing IBM). I waited until he was back on his feet before resigning.)
- Be Careful of Database Shut Downs/Start Ups in an ERP Environment
There are a couple of examples to make this point, the second involving the park district upgrade project cited under my documentation point above.
I started my ERP DBA experience in 1996 doing SAP Basis Administrator duties at a well-known baking supply company located in the southwest suburbs of Chicago. At that time, Basis Administrators were in high demand and could fetch 6-figure salaries. The company had put a former network administrator through over $30,000 in training at SAP, Veritas, Oracle, and Sun; they were concerned that he could be poached by another company, so they hired me as a form of insurance. On their part, they made a commitment to some SAP training.
In reality I ended up doing the day-to-day administration while the other administrator was off to meetings. A couple of other points setting up the story: the company had Deloitte consultants on site, and bouncing SAP and the database had to be done in a specific order; the process usually took around 20 minutes. Basically bouncing relevant servers/services at least once a week was important because the server memory would fragment, and if a job couldn't be allocated a big enough chunk of memory, it would fail--and these could snowball very quickly, typically on a Friday. For the most part we could get through 4 days or so between reboots, but the fifth day was potentially a problem. And let me tell you, working through 20-letter German error terms is not fun. I remember the odometer was at about 35 jobs one Friday, and he started playing politics going through approvals. I'm telling him, "We've got to bring things down now; no time to waste." The odometer had surged past 220 as now all new jobs were failing before I got the okay to bring things down. I spent literally weeks working through the failed jobs.
As part of the company's agreement, I had signed up for an SAP on Unix course. (Their courses were so popular that I used to joke they could offer a $4200 course on "Basket weaving for SAP" and sell out. SAP wasn't forgiving of late changes like cancellations or rescheduling.) So my boss just before I'm scheduled to start training, decides to pull me out of class to shadow Deloitte contractors (who didn't want to be shadowed) working on an upgrade. (SAP at the time strictly qualified who could do upgrades.I had not qualified through their system to do upgrades.) I'm obviously not happy; and my boss, rather than lose my course fee, decides to let our main Unix admin go. Huge mistake of Biblical proportions.
So at lunch time we're eating our sack lunches when my colleague takes a phone call from the junior Unix admin who reports some message he's seeing on one of the monitors. Now recall my point about the protocol of bouncing SAP; my colleague's brain seem disengaged from his mouth as he responds, "Well, when I was at Sun school, they said when we see a message like that, reboot." Say what? I turned to my colleague, astonished at what he just said, He seemed to suddenly realize what came out of his mouth. "Of course, don't do it NOW...." He turned pale, hung up and headed to the server room. "Gotta go." All of a sudden, everything went black. About 20 minutes later, my colleague asks for my help. "Oracle is not coming up. I don't understand these messages I'm seeing." Now my colleague had a lot of pride and would probably lose a limb before asking me for help. It's difficult to explain; Oracle's interface encountering hardware issues can be obscure. It was like Oracle was saying, "Now I see the redo logs; now I don't." I know we're running Veritas and I quickly infer we have corrupt mirrors. I had no training in Veritas but I had him open it to some display which showed mirrors being rebuilt one pane at a time. I extrapolated recovery time to be 3.5 hours, which was almost exactly the case. In the meanwhile. we've got 75 people whose work responsibilities depending on SAP being up coming back from lunch and discovering things are dead.
My boss asked my colleague what happened, and he lies, of course, saying something like it's a random incident, shit happens, etc. I repeatedly argue that it's a Veritas issue, not an SAP or Oracle issue. I suspect my boss realized that having his senior Unix admin, who would have known better than to reboot in a nonstandard manner in the middle of a business day, attend my class would come back to bite him on the ass. So my boss has my colleague talking to the SAP SWAT team (high level emergency support), and they're looking at the database initialization parameter file. Idiots! If there was an issue with the parameter file, it would be consistent across database bounces, and this was different. The CIO, based on my boss's information, tells the company users the cover up story. I later tell the CIO what really happened, and my supervisor immediately fires me. To be honest, I was close to quitting on my own over my boss's boneheaded decision. There were management and training issues there, and scapegoating me didn't change anything.
Now to the metropolitan park district issue: Oracle is configured somewhat differently on the NT vs. Unix platform. I'm going to respond in terms of the platform at the time; I have not worked on NT backservers recently and there could be nuances since then. But typically you have a couple of key services per database SID. If you manually shutdown the database, it may shut down one service but not both. The key takeaway is both services need to be down before you make a usable cold backup.
I had advised CB to keep the database in archivelog mode (which enables more flexible recovery), but he disregarded my advice (DBAs don't normally keep test databases in archivelog mode, but you don't accumulate logs closing out an Apps database for upgrade), Now I have to explain a huge mistake he made to explain my point about shutting down one of my Apps databases, Vision Demo.
Oracle has traditional maintained statuses of concurrent managers (Apps background processes) in the database. If you shut down the concurrent mangers properly before shutting down the database, these statuses will be updated. The issue comes into play on start ups; Oracle looks at the stored status to see whether the process is up; it's not going to restart a process already up. So you could have a case where the process is really down but the stored status is invalid/corrupt, like if the database come down before closing out the Apps background processes. Oracle has a cmclean.sql script available on Metalink/MOS which should clear out relevant corruptions.
One day I was doing a last-minute check of Vision Demo, a training database, to ensure everything was up for park service personnel in for training. The concurrent managers were down. This was curious; maybe the concurrent managers hadn't been launched. But no, they weren't coming up. It had to be sticky statuses. I apply cmclean. The concurrent managers are back up. Now I know what happened: CB had brought down my Vision Demo database without first bringing down the managers. Incompetent; but more importantly, why was he messing with Vision Demo?
I ask CB if he knew anything about Vision Demo being down. He said, yes; he explained that he had just added a fourth datafile to the system tablespace and realized that he could have just as easily have simply extended an existing datafile. So he manually deletes the new datafile. The database crashes and won't recover. (OK, if I'm a DBA manager, I terminate the DBA for cause immediately when this happens. You NEVER, EVER screw with the system tablespace, and you never fat-finger a datafile unless you're sure it's not in the database or removed in a process like offlining.) But what does this have to do with my Vision Demo?
CB called Oracle Support. (I doubt that Oracle Support is this bad.) He says Oracle told him to look for other datafiles around named SYSTEM04.dbf. My Vision Demo had one. So he brought my database down and scp'ed a copy of SYSTEM04.dbf to his server. He looked at me through his thick glasses and said with sincere astonishment, "Would you believe it didn't work, Ron?"
OK, if any Oracle DBAs are reading this, it's milk-squirting-out-your-nose funny: you can't mix-and-match datafiles from different databases. It took all my self-control not to laugh. This is the kind of story I would use as an ice-breaker at Oracle's annual conference. It's tragic but funny as hell. Dude, you worked at Oracle for 17 years? As what--a janitor?
It turned out that all four of his weekly cold backups were unusable. The bottom line is 6 weeks into a 12-week upgrade process committing to 3 test upgrades, we were now at the start again. The PM refused to fire him because he didn't want to train a new DBA. I would have trained a replacement for free. I didn't even trust this guy to go to McDonald's to pick up lunch. (I once went there with him and saw him go around picking up abandoned receipts. An expense report scam.) There's more to the story but save it for a future post.
Let me summarize a few points from the first section and this section: know your platform. For example, Windows on a backserver stores logs in a trace directory off Oracle Home. You reference service controls. Be careful of what an install or upgrade does to your Oracle registry. There are nuances to references in environment variables. Note this point also applies to non-ERP Oracle databases as well:
- You Need To Take Platform Nuances Into Account
Finally, quite often an ERP release includes interoperable integrated Oracle/vendor software--database, middleware, application software, etc. They are typically certified for platform versions, e.g., RHEL 6.6, with various Unix/Linux packages, etc. Oracle contains in Metalink/MOS certification matrices. You can't simply make a unilateral upgrade of one of these components just to be up to date with the latest software. Oracle won't support an uncertified mix of software product releases.
Let me go back to a heated argument I had with the project DBA's on the park district project, referenced twice above. Now some big shops often liked to run one version of the Oracle RDBMS, e.g., run 11.0.3 of Oracle Apps on 8i vs. 8 when they had other databases running 8i. They also wanted to be able to break down the Apps upgrade into separate database and Apps upgrade portions. Now let me first point out that you don't save much time because the database upgrade is maybe one hour in a 3-5 day upgrade window. The point is that you had to run a lot of INTEROPERABILITY PATCHES so you could run 11.0.3 on 8i. And there can be issues with interoperability patches. So here's the top-level view: an upgrade is done in categories; in a typical upgrade you do closeout activities in Categories 1-3 and late in Category 3 you do the database upgrade, followed by the Apps upgrade driver. Now the closeout steps are not very time-consuming, maybe a few hours at most.
This is the point that the consulting PM, BK, allegedly a PhD like me, couldn't seem to grasp, and neither could his staff DBAs: there is no advantage to running interoperability patches just to do closeout activities for the database, but it was like pulling teeth. So they wanted to do the 8i upgrade and interoperability patching in category 1 vs doing the 8i upgrade in category 3. You add a hell of a lot of patching in Category 1 which isn't necessary to get to the upgrade portions of category 3. Here's the next salient point: the client wasn't going to do a two-phase upgrade. We got one outage to go live over 3-5 days. The category 1 option requires a sinkhole of patching for a short-term, at best, benefit. No interoperability patching means I get to the database upgrade and Apps driver sooner than later. This is not rocket science, folks!
I eventually carried the day because the IT deputy manager understood my point. But just remember don't upgrade anything one off until you verify it's certified and know new software versions imply a certain level of interoperability patches. Otherwise, Oracle won't support your installation.
A second example. I got called to join an Apps project in the Virginia Beach area. This was after I finished my work on the airport store chain referenced above. At that time, Developer was part of the middle server configuration (forms, reports, etc.). The client Unix admin had caused a problem because he decided to upgrade to the latest (not Apps certified) version of Developer, not realizing the interoperability issues with what he did.
The Unix admin shadowed me like a hawk. He complained to his boss that there was nothing I was doing that he couldn't do. Dude! The reason the client hired me to the project is because I wouldn't do what you did. While I was there, the trains ran on time. He had caused a problem where consultants costing cumulatively hundreds of dollars an hour were sitting on their hands unable to do their job. I was treated by the consulting PM like a rock star on my arrival. And so my final point for this essay:
- Do Not Make Unilateral Chances to Software Components Unless the Vendor Has Certified the Changes. Understand that Major Component Upgrades May Require Interoperability Patches
Subscribe to:
Posts (Atom)