How to Analyze Oracle 12c Error Messages - dummies

How to Analyze Oracle 12c Error Messages

By Chris Ruel, Michael Wessler

Ever hear the expression “hiding in plain sight”? That phrase often applies when people see an Oracle 12c error message. They see the message, but they don’t actually read it and think about what it says. As a result, the most valuable clue you have isn’t fully maximized.

Avoid falling into that trap. Make the most of your error messages:

  • Slow down and read the error message — several times if necessary. Think about what it’s saying. Don’t just rattle off ORA-1234 and the description. Ask yourself what specific action is failing based on the context of the error message and what is going on at the time of the message.

  • Pretend you’re the application and ask what you were doing when the error occurred. Then apply the text of the message to see which piece or action is failing. Breaking down a larger process into individual steps and performing each step to see where something breaks is an effective troubleshooting technique.

  • Apply most of your focus on the first error message you receive. Often a series of error messages occurs related to one event, but typically that first message is the cause of the other messages.

  • Know the types of error messages and which components they relate to:

    • ORA denotes database or SQL errors.

    • TNS denotes database listener or Oracle Net communication issues.

    • HTTP is web related.

    • LDAP denotes details with your directory server, perhaps Oracle Internet Directory.

    The architecture of your system determines what components may generate errors. Know what components exist within the system and the process flow so you can tell what part of the system is failing based on the type of message.

  • Become familiar with normal messages versus extraordinary error messages. Many harmless informational messages crop up for events that aren’t errors — particularly when working with log files. Know what your system logs look like during normal operations so that when real errors occur, you can identify them easily.

  • Plug the error message into my Oracle Support (formerly Oracle Metalink) and your favorite Internet search utilities to get more detailed descriptions and possible fixes. Expect lots of irrelevant information and false leads. But odds are good that your search results will also include information that helps identify and fix the problem.

Basic database and Oracle 12c system anatomy

A firm understanding of Oracle database architecture and processes is key to your ability to understand error messages and diagnostic output. Think of it as database anatomy. You would flee if your doctor said “I don’t really understand that heart stuff,” right? Similarly, what kind of Database Administrator (DBA) is clueless about the SYSTEM tablespace.

Where in the overall system does your database fit, and what are the components? If people are reporting an HTTP-404 error, you probably want to get the web administrator involved because the problem may be a web server or content. But if your database generates the HTML content via mod_plsql web Toolkit, it may actually be your database having issues.

Not knowing that would result in the problem being routed to the wrong people, further delaying the fix. There simply is no substitute for knowing the specifics of how your system works and being able to apply that knowledge.

Error system example in Oracle 12c

Look at this example of a common message that confuses people. Upon analysis, the cause is simple to identify.

ORA-01034: ORACLE not available.

ORA denotes a database message (versus TNS for a listener or HTTP for a web error). What does Instance not available mean in terms of databases? If you know database architecture, you know that an instance is the memory and background processes for a database. Thus, the database instance may not be running; you need to confirm that.

A quick ps –ef | grep pmon on the server shows no PMON process running, so now you have confirmed the database instance is down. A further check using the uptime command shows the server was restarted 15 minutes ago; you can assume the database didn’t restart after a server crash or reboot.

At this point, you can check database logs to see whether the instance tried to restart and failed or no attempt was made to restart. Based on that, you can manually restart the database and get users back to work.

oerr utility in Oracle 12c

The oerr utility gets more information about an error message. This command-line utility is where you specify an error number and the oerr utility provides the most likely problem causes and possible solutions. Although it isn’t in-depth troubleshooting, it is very handy:

$ oerr ora 1034
01034, 00000, "ORACLE not available"
// *Cause: Oracle was not started up. Possible causes include the following:
//     - The SGA requires more space than was allocated for it.
//     - The operating-system variable pointing to the instance is
//      improperly defined.
// *Action: Refer to accompanying messages for possible causes and correct
//     the problem mentioned in the other messages.
//     If Oracle has been initialized, then on some operating systems,
//     verify that Oracle was linked correctly. See the platform
//     specific Oracle documentation.