How to Troubleshoot with Your Oracle 12c System Methodology - dummies

How to Troubleshoot with Your Oracle 12c System Methodology

By Chris Ruel, Michael Wessler

If an Oracle 12c crash hasn’t happened to you yet, it will definitely happen eventually. When it does, it won’t be at a convenient time.

The problem is that people assume that because they have an Oracle-based system, the problem must be with Oracle. It could be, but you don’t know just yet. Oracle is simply a component of a larger system, and the root cause and solution may not be Oracle-based. Even if you get an Oracle database error message, the cause may be something outside of Oracle.

Be sure to look at the overall system supporting the database, not just the database itself.

Don’t react to a problem report on face level. Apply a structured, repeatable pattern when addressing problems. This next statement cannot be stressed enough: Yours is a technical profession, and you’re paid to solve problems, not simply to react and hope for a quick fix.

Everyone has a troubleshooting methodology tailored for their unique environment, but the following is a start:

  1. Identify the real problem. Determine and confirm what’s happening in the system.

  2. Perform basic system checks. Check the server, operating environment, and connectivity for outright errors and performance degradation.

  3. Perform basic database checks. Confirm that the database is running and see whether you can log in to it.

  4. Determine what your error messages mean.

  5. Develop a solution and apply it. Confirm that the fix works and that there aren’t unintended consequences.

With experience and time, you will modify these steps for your environment. Depending on the situation, you may process some steps very fast — but they’re still processed, not skipped.

Before doing anything to fix the perceived problem, you need to know what the real problem is. You can’t guess or assume. It’s far better to treat the real cause of a problem, not just the symptoms.

People reporting problems get excited, miss key details, make assumptions, and often inaccurately state the nature and severity of a problem; that’s simply human nature. If you think otherwise, ask any cop or ER doctor about the quality of the initial witness reports they receive. This is exacerbated in computer work because many people who are reporting problems aren’t technical and can’t articulate their problems very well.

You need to determine what system component has the problem and what is specifically happening before you can develop and apply a fix. Ask the following questions:

  • What: What specifically is happening? Have the user walk through what he’s doing when the error occurs. Work directly with the person having problems and monitor the issue in real time rather than getting second- or third-hand information. Get screen shots or the error messages themselves.

  • Who: Who’s being impacted? Is it one or two users? Is it a specific subclassification of users? Is it everyone? Also, is it your production, test, or development system? Never assume that because someone is excited, it must be production. Trying to fix the wrong database will leave you blushing with embarrassment.

  • Where: Are affected users spread over a wide geographic location, or are they in a specific city or building?

  • When: How long has this been occurring, and has it occurred before? Also, does it happen every time or just sometimes? If it happens only occasionally, drill down into what’s being done prior to the error.

    If it occurred only since a recent system change (such as a patch, an upgrade, or a reboot), that can be a valuable clue. The question “What has recently changed in the system?” is a great one to ask!

  • How bad: Is this a total loss of service where the company is stopped, or is it just an annoyance on a seldom-used development system?

After asking these questions, you should know what’s happening, who it’s happening to, how bad it is, and when it started. You should also have a rough idea of what subsystem or components to start checking.

Keep a cool head when troubleshooting hot issues; be methodical and work in a logical manner until the problem is fixed (and confirmed to be fixed). Other people may become excited, stressed, or unprofessional, but you need to keep your wits and professionalism as you work toward a solution. Don’t let yourself be intimidated by irate users or management standing over your shoulder.