Friday, April 5, 2019

A Brief History of My Hell in Dealing with Oracle Tech Support

What follows is an excerpt from my flagship political blog (I have a journal format.) Probably every Oracle DBA has his own Support stories. I remember when I worked on the West Coast, a number of us would wait until after Colorado tech support would roll off around 9 PM, so we could access more competent Australian Tech Support. I remember one particularly obnoxious moment when I met some in passing. I was in an "Oracle University" class (back when I was an Oracle senior principal in the GEHS pratice); as I recall it was an advanced replication course. There was a string of them sitting in the back of class, cracking "dumb client DBA" stories. It wasn't just their incompetence and lack of professionalism: they would outright lie about customers (to gullible duty managers).

I remember one slanderous, totally baseless allegation about me, that I had subjected one female analyst to verbal abuse (swearing, cursing, etc.) She was reportedly so emotionally distraught that her duty manager had to send her home for the day. The best you can say about Oracle on this is that if it happened it was a case of mistaken identity. I keep my cool even under adverse circumstances like terminations. [I sometimes say that I must have this Howdy Doody personality, because people will pick me out out of a crowd to ask me to retrieve an item from a supermarket shelf, ask me for directions, etc.] I spent 8 years in academia, teaching literally hundreds of students. I had to deal with all sorts of rogue student behavior (probably the most outrageous was when I warned students about my academic dishonesty policy (without identifying the students in question), and the young woman (in the most ludicrous self-incrimination I've ever heard) guessed (correctly) that she was implicated and had a complete emotional meltdown in lecture). I never singled out a student in front of his classmates; if a student asked a question that I answered 5 minutes earlier, I didn't make a fuss over it; lecture him about not paying attention, etc. It was just easier to repeat the answer. Now I might push back on something wrong an analyst said, but yelling insults or swearing at people is not in my personality. I don't see it as productive. If I had to deal with an unpleasant or incompetent analyst, I would escalate the issue to a more senior analyst. I never went to a duty manager to lodge a complaint against a specific analyst; I was always more focused on getting the underlying issue resolved and once the rogue analyst was out of the way, I figured it was Oracle's problem to deal with him.

Much of my dissatisfaction was not having access to Oracle's internal knowledge base, and I was frustrated because many of their analysts didn't know how to query correctly. In particular, in reference to the 12.2.1.3.0 SR I described below, there was an ultimate fix of modifying a configuration parameter. So I must have gone through a handful of Engineering Support analysts, but the last analyst solved it like in 10 minutes, noting that another government client with an ODA (HHS?) had run into the same issue in an upgrade from 11G to 12C. and at some point they had had done a bare metal restore.

One of the things that particularly annoys me is the busy work often required in filing SR's, even when you're asking Engineered Systems when is the next patch going to be released (which was always futile: it'll be released when it's released). In part, I was having to answer to a government auditor wanting to know why there was a lag between CPU patch releases and ODA bundle patches. So Oracle was often wanting like 5 GB of logs, etc. uploaded, and it seemed like maybe 25% of the time their upload process was broken--and then, of course, they would never do anything with the logs but it's what they needed to punch your ticket.

I've been a professional Oracle DBA for 26 years. It would take probably a dozen posts of this size to run through all the issues I've encountered with Oracle Support, but the following is a good start:

I'm going to attempt to be as readable as possible in describing a couple of Oracle Tech Support scenarios. I'm sure almost every reader has their own past issues with tech support. Let me describe the typical Oracle Support scenario during the late 90's when most issues were still handled through calling an 800 number. (Oracle Support is not "free"; don't quote this percentage, but it's close enough: maybe around 22% of your license costs annually.) Oracle has a support portal, originally called Metalink, more recently My Oracle Support; this includes a knowledge base/search engine of notes, etc., on error conditions, best practices, and other material. . Oracle Support has access to a superset of KB called webiv, including information not available to Oracle customers.

Oracle trains a lot of inexperienced analysts just out of college. So they may take your issue symptoms, feed it to webiv, maybe pull up 40 or so hits, almost all totally irrelevant to your context. And they want to go through each link in order (and demand you provide evidence that the irrelevant citation doesn't hold); if you push back on any of that, they'll accuse you of having a bad attitude and threaten to close the TAR (technical assistance request)/SR (service request). So in the real world as a production DBA, I don't have the time or patience to deal with inexperienced analysts; what helped was escalating the issue to a senior analyst, I can think of at least a couple of dozen occasions where a senior analyst would resolve my issue literally within 5-10 minutes.

Oracle Support duty managers hate me with a passion because in their view, I was escalating issues too frequently (not true, but that's a judgment call). For the most part, I consider duty managers as not so bright self-important incarnations of the Peter Principle. Keep in mind none of these people ever directly interacted with me. But a couple of telling incidents; while my boss was out of the office, the switchboard forwarded a screaming duty manager to me, and this guy, clueless that he was talking to me, started lying about me in his rant. In a second example, while I was a senior principal (highest non-managerial rank) at Oracle Consulting, a hostile duty manager called my practice manager and promised to help recruit my replacement if he would fire me.

I think I've previously discussed my initial bad experience with Oracle Support I think around the summer of 1996 working as a subcontractor for a large credit card issuer in the St. Louis suburbs. The company probably had the worst setup for new people I've ever seen. I was assigned to a cubicle over the next several weeks where I didn't have access to voice mail or a network account, so basically I had to work at an open workstation out of sight from my cubicle. So, among other things, I wouldn't be able to hear or answer my cubicle phone--and even if there was a call that dropped to voice mail, I couldn't retrieve it.

So my boss, basically another contractor (different company) told me to call Oracle Support and not to settle for anything less than a response to a technical issue. (He headed the production DBA efforts (with OPS, a precursor to RAC, which involves multiple servers with access to the same database, effectively supporting nearly 100% uptime) while I maintained the (non-OPS) test database. The client had nearly 2 dozen "DBA's", being transitioned from mainframe roles, but basically contractors were doing the real work.

I called up Support and got connected to a Support analyst doing a form of computer triage; he determined that the issue I was trying to address didn't deserve same day service. The client had an expensive "Gold Service" maintenance agreement. I tried to argue the point, given my boss' instructions, but I got nowhere with this Oracle cog. In fact, he started ranting that I was violating Gold Support standards, and Oracle could decide to drop Gold Support for the client.

It didn't stop there. It turns out one of the mainframe DBA retreads served as the local Metalink administrator. So the analyst or his duty manager ends up calling up the client "DBA" and starts ranting over this rogue contractor and repeated the (empty) threat to revoke Gold Support coverage. Now client management (who really didn't know me) are pissed off at me; if my boss said anything to defend me, I didn't know about it.

The Oracle troublemaking bastards weren't done with me. As the reader may have guessed (given my explanation of voicemail access), if and when Oracle Support did call me back on the SR, it went to voicemail. Now the SOBs are going back the the client Metalink administrator and complaining that this rogue contractor is refusing to return their calls and they want to close the TAR. Do you think the client management takes responsibility for the fact after weeks, I still don't have voicemail access and fixes the problem? Of course not. It's easier to them to accept Oracle's complaints at face value. (As I recall, they let me go not that long after that. Luckily I had written into my contract that my employers would have to pay my moving expenses back to the Chicago area. The clients wouldn't agree for me to travel from Chicago.)

It would take me several book chapters to chronicle my issues over the years with Oracle Support. But I'll settle for a couple of recent examples that I think speak for themselves and illustrate the level of incompetence that I've had to deal with.

A few years back Oracle purchased Sun Microsystems; probably a good plurality of my first 15 years or so of professional DBA work dealt with Sun hardware and their signature Solaris (Unix) operating system, although I think their main motivation was to acquire rights to the very popular Java language.

One thing that Oracle did with their hardware acquisitions is develop the concept of a database appliance. Basically the Oracle database appliance (Oracle engineered systems) provided Oracle customers with a one-stop shop concept; instead of customers having to deal with vendors pointing at each other over technical issues, Oracle assumed that risk and provided comprehensive support and patching for everything from firmware to RAC database patching through ODA patching; unlike Oracle's quarterly security "Critical Patch Updates",  ODA patches (including CPU), in my experience, were released at irregular intervals from 6 weeks to up to 5 months.

So my first example was an issue that affected my development ODA. One of the 2 clustered servers would not update. As I seem to recall, it was eventually resolved by replacing  a configured parameter that affected ODA's that had undergone a prior bare metal reinstall (done by a predecessor in my role), But when I initially filed the SR, the Oracle Engineered Systems Support analyst denied the ODA bundle patch in question (it may have been 12.2.1.3.0)  had been released, demanded to know where I got this unofficial (beta) software, and reminded me Oracle Support didn't support beta software. I got the usual presumptuous lecture (I think it's a script they all memorize) that I need to read ODA master note 888888.1; I knew 888888.1 better than any Support analyst I ever dealt with.

I have no idea how bad internal communications are at Oracle Engineered Systems, but I have no clue how Support engineers were unaware of a released patch. But 888888.1 noted around the transition from 12.1 to 12.2 ODA patching, that client DBA's should identify new versions from release note links on the OES help/home page. [Think that Oracle Engineering would send a new bundle patch alert to their customer distros like for CPU patching? No, that would make too much sense.] (Eventually 888888.1 would be updated to link to the latest patch.)

So I patiently tried to walk him through the publicly available release note link to the release notes and the hot link in the release notes to the platform patch download. But there was no reasoning with him; he was in a state of denial. I think I ended up having the SR reassigned. This was embarrassing for Oracle because you can't explain away that kind of incompetence.

The second example requires knowledge of an overview of ODA patching from the original OAKCLI (command line interface). There is an unpacking operation of the ODA patch from typically 3 zipfiles to local root-accessible directories. Then one basically does version updates of up to 3 components: system, storage, and database.

So there was no 12.2.1.5.0; we had to go from 12.2.1.4.0 to 18.3. This was a major change in ODA patching for my version ODA: we would be transitioning from OAKCLI to the DCS model with an ODACLI language during the system component upgrade. The documentation then outlines (supposedly) the process of using ODACLI to patch the storage and database components.

It turns out that there's a separate zipfile/"updating the repository" [unpacking] step for the database component. And the DOCUMENTATION STOPS THERE. (The last time I checked a few weeks back they still hadn't modified this.) Conceptually unpacking the software/updating the repository doesn't involve applying the patch. You know, is the emperor wearing clothes? On my own, I discovered the missing piece (a verb to update the database home) using a published OAKCLI/ODACLI cross-reference. I'm trying to explain to the Support analyst: "Dude, my RAC software hasn't been updated from the April to July CPU." I had already filled in the missing pieces. I  thought they had a duty to fix the documentation so DBA's following me wouldn't have to resolve the issue on their own. As a documentation guru, I had no idea how they could leave their documentation incomplete during a one-time migration process. This is not rocket science; they simply were too incompetent to fix their own mistakes, even when a customer pointed out the problem and what to do to fix it.