Sunday, June 1, 2014

ORA-04031 error troubleshooting

ORA-04031, 00000, "unable to allocate %s bytes of shared memory (\"%s\",\"%s\",\"%s\,\"%s\")"
// *Cause:  More shared memory is needed than was allocated in the shared pool.
// *Action: If the shared pool is out of memory, either use the
//          dbms_shared_pool package to pin large packages,
//          reduce your use of shared memory, or increase the amount of
//          available shared memory by increasing the value of the
//          INIT.ORA parameters "shared_pool_reserved_size" and
//          "shared_pool_size".
//          If the large pool is out of memory, increase the INIT.ORA
//          parameter "large_pool_size".


ORA-4031 error occurs due to following reasons.
  a.       Memory unavailable for use or reuse in SGA.
  b.      Auto-tuning issues
  c.      Fragmentation in memory due to application design.
  d.      Bug/leaks in memory allocations.

Instance Parameters related to Shared pool
Ø  SHARED_POOL_SIZE - This is the size of the shared pool in BYTES and can accepted in K (x by 1000) and M (x by 1000000)
Ø  SHARED_POOL_RESERVED_SIZE
ü  It is the shared pool size which is reserved for large contiguous request for shared pool memory.  This parameter along with SHARED_POOL_RESERVERD_MIN_ALLOC can be used to avoid the occurrence of this error from situations where shared pool fragmentation forces oracle to search for and free chunks of unused memory to satisfy the request.                
ü  Ideally, this parameter should be large enough to satisfy the request of scanning the memory on the reserved list without flushing objects from the shared pool.
ü  This parameter should be 10% of the SHARED_POOL_SIZE

Ø  _SHARED_POOL_RESERVED_MIN_ALLOC  (hidden parameter )
ü  This value controls the allocation of reserved memory.
ü  Memory allocation larger than this value can allocate space from the reserved list if a chunk of memory of sufficient size is not found on the shared pool free lists.
ü  Default value (4400) is sufficient in most systems.
ü  If we increase the value, oracle will allow fewer allocations from the reserved list and will request more memory from the shared pool free list.  
ü  Value can be found using the query in step 6.

Cause for fragmentation in memory

Ø  Oracle shared pool is managed by HEAP MEMORY MANAGER and is divided into chunks (think as blocks in tablespaces) and the free memory chunks are managed by memory freelists.
Ø  When a sql statement is parsed in shared pool, it needs memory for storing execution plan  and other internal structures.
Ø  When the execution is completed, the memory is freed and released back to the freelist. This repeated use and release of memory structures causes FRAGMENTATION in shared pool.

Diagnosing

When ORA-4031 error occurs, a trace file will be created and it is updated in the alert log if the error is due to background process. Though there are cases that this error is not written in alert log but trace file exists in user_dump_dest.
e.g ORA-4031: unable to allocate 2196 bytes of shared memory
(shared pool,JOB$","KGLS heap","KGLS MEM BLOCK))
ORA-4031 has 3 arguments
  a.       Size requested
  b.      Area in the SGA
  c.       Description/comment.

Algorithm for memory allocation is as follows.
a.       First it will try to get the memory from the shared pool  and if the requested memory is more than _SHARED_POOL_RESERVED_MIN_ALLOC ,
b.      Then it will get memory from shared pool reserved area.

Inadequate sizing: Due to lack of contiguous space in the library cache.
REQUEST_FAILURES  > 0 and LAST_FAILURE_SIZE  > SHARED_POOL_RESERVED_MIN_ALLOC

Solution :  Increase _SHARED_POOL_RESERVED_MIN_ALLOC to lower the number of objects being cached to the shared pool reserved area and increase SHARED_POOL_RESERVED_SIZE and SHARED_POOL_SIZE  to increase the available memory in the shared pool reserved space.
In 10g, this done automatically and can be seen in V$SGA_DYNAMIC_COMPONENTS and V$SGA_RESIZE_OPS view.
In 11g, this is done by MEMORY_TARGET , can be seen in V$MEMORY_DYNAMIC_COMPONENTS V$MEMORY_RESIZE_OPS view.

Fragmentation : Need to find fragmentation in library cache.
REQUEST_FAILURES > 0 and LAST_FAILURE_SIZE  <  SHARED_POOL_RESERVED_MIN_ALLOC.
Solution: Lower SHARED_POOL_RESERVED_MIN_ALLOC to allow the database putting more objects in shared pool reserved space. If the problem is not yet solved, consider increasing SHARED_POOL_SIZE.

Metalink note (146599.1)

As soon as you or application user gets ORA-4031 related to shared pool ,  following steps needs to be done.  For general analysis on ORA-4031 error, go through metalink 430473.1
Step 1: Check in the error message to which pool  the error is hitting.  If it is LARGE_POOL_SIZE or JAVA_POOL _SIZE we need to increase the size in initialization file and restart the database.
Eg: ORA-04031: unable to allocate bytes of shared memory ("shared pool","unknown object","joxlod: init h", "JOX: ioc_allocate_pal")
ORA-04031: unable to allocate 40 bytes of shared memory ("large pool","unknown object","kxs-heap-f","frame segment")

Step 2: If it is the development environment, try to flush the shared pool .
Alter system flush shared_pool;
Step 3 : If the user comes back again, then we need to perform following steps if it is related to Shared pool as the error could be due to following.  
a.       Under size of the SGA_TARGET  or shared pool size manually.
b.      Memory leak
c.       Bug.
d.      Fragmentation in shared pool.
Step 4: Next, check in alert log for the error log and the corresponding trace file.
Step 5: Check the parameters (SGA_TARGET,MEMORY_TARGET) to see whether ASMM or AMM is configured .


Step 6: What is the size of the SHARED_POOL_SIZE. Metalink note (146599.1)
Set lines 150
Set pages 30000
COLUMN parameter                FORMAT a37
COLUMN description               FORMAT a30 WORD_WRAPPED
COLUMN "Session VALUE"     FORMAT a15
COLUMN "Instance VALUE"    FORMAT a15
SELECT
   a.ksppinm  "Parameter",
   a.ksppdesc "Description",
   b.ksppstvl "Session Value",
   c.ksppstvl "Instance Value"
FROM
   x$ksppi a,
   x$ksppcv b,
   x$ksppsv c
WHERE
   a.indx = b.indx
   AND
   a.indx = c.indx
   AND
   a.ksppinm LIKE '/_shared%' escape '/'
;
Step 7: Check in V$SHARED_POOL_RESERVED to see the misses of memory report over the life of the database.  Metalink note (146599.1)
set lines 150
set pages 30000
select  Free_space,avg_free_size,max_free_size,used_space,avg_used_size,
max_used_size,request_misses,request_failures "Number of time ORA4031 errors" from V$SHARED_POOL_RESERVED;

FREE_SPACE AVG_FREE_SIZE MAX_FREE_SIZE USED_SPACE AVG_USED_SIZE MAX_USED_SIZE REQUEST_MISSES Number of time ORA4031 errors
---------- ------------- ------------- ---------- ------------- ------------- -------------- -----------------------------
  31934232    347111.217        503624    1304952    14184.2609        503624              0                           237
SQL> select LAST_FAILURE_SIZE from v$shared_pool_reserved;

LAST_FAILURE_SIZE
-----------------
             4192

Step 8:  See how much free space is available in SGA.
Select * from v$sgastat;

Step 9:  In case, if you want to see the breakdown of the memory in SGA and also the breakdown of memory used in top 5 subheaps in SGA., we need to trace the memory dump, when we are getting the error.   Metalink note (369640.1)

alter system set events 'immediate trace name heapdump level 536870914';
or

sqlplus /nolog
connect / as sysdba
oradebug setmypid
oradebug unlimit
oradebug dump heapdump 536870914
oradebug tracefile_name
oradebug close_trace

Close the SQL*Plus session and find the heapdump trace file listed in the 'oradebug tracefile_name' command above.
If the problem is actually associated with permanent memory structures (tracked under the 'miscellaneous' table entry), there is not a way to get information on these memory areas unless you set the event 10235 level 65536.  This event should only be set under direction from Oracle Support


Step 10 :  Check for the shared pool size and if you have less space , check for following informations,
a.       Library cache hit ratio.
The hit ratio helps to measure the usage of the shared pool based on how many times a SQL/PLSQL statement needed to be parsed instead of being reused. The following SQL statement help you to calculate the library cache hit ratio:

SELECT SUM(pins) "EXECUTIONS",   SUM(reloads) "CACHE MISSES WHILE EXECUTING"  FROM v$librarycache;

If the ratio of misses to executions is more than 1%, then try to reduce the library cache misses by increasing the shared pool size.

b.      Shared pool size calculation.

Use the scripts below . unfortunately I couldn't make the scripts available in this page. Will make sure the link is available soon. But these can be found in Oracle Support/Metalink.

Step 11 :  If the shared pool size is large enough and you see still ORA-4031 error, then the memory would have got fragmented.  To avoid fragmentation, we need analyze how the application is using the shared pool and maximize the use of sharable cursors.
a.       Check for value in CURSOR_SHARING
b.      Check whether value of SESSION_CACHED_CURSORS is  TOO HIGH
c.       Check the sql area view to find the candidates for using the bind variables.

SELECT SUBSTR(sql_text,1,40) "SQL", COUNT(*), SUM(executions) "TotExecs"  FROM v$sqlarea WHERE executions < 5 GROUP BY SUBSTR(sql_text,1,40) HAVING count(*) > 30  ORDER BY 2;
Note: The number "30" in the having section of the statement can be adjusted as needed to get more detailed information.

d.      Check x$KSMLRU
ü  This fixed table will keep track on the allocations in the shared pool that cause other objects in the shared pool to aged out. 
ü  This table is used to find what is causing the aging out.
ü  The contents in x$ksmlru table is erased whenever someone selects from the fixed table.  This is because to have only large allocation query  alone in this fixed table.  So, careful output is set as we cannot retrieve the query.
ü  To monitor, just run.
SELECT * FROM X$KSMLRU WHERE ksmlrsiz > 0;



e.      Use the below query to see how the free space is allocated. But be careful, as this query would HANG the database due to bug.
Be aware of Bug:4715420 stating that selecting from X$KSMSP is asking one session to hold the shared pool latches in turn for a LONG period of time and should be avoided on live systems. Selecting from X$KSMSP on a production system is a very bad idea.

Using this view you will be able to find out how the free space is currently allocated, which will be helpful to understand the level of fragmentation of the shared pool. As it was described before, the first place to find a chunk big enough for the cursor allocation is the free list. The following SQL shows the chunks available in the free list:

SELECT '0 (<140)' bucket, ksmchcls, ksmchidx, 10*TRUNC(ksmchsiz/10) "From",
       COUNT(*) "Count", MAX(ksmchsiz) "Biggest",
       TRUNC(AVG(ksmchsiz)) "AvgSize", TRUNC(SUM(ksmchsiz)) "Total"
FROM x$ksmsp
WHERE ksmchsiz<140
AND ksmchcls='free'
GROUP BY ksmchcls, ksmchidx, 10*TRUNC(ksmchsiz/10)
UNION ALL
SELECT '1 (140-267)' bucket, ksmchcls, ksmchidx,20*TRUNC(ksmchsiz/20),
       COUNT(*), MAX(ksmchsiz),
       TRUNC(AVG(ksmchsiz)) "AvgSize", TRUNC(SUM(ksmchsiz)) "Total"
FROM x$ksmsp
WHERE ksmchsiz BETWEEN 140 AND 267
AND ksmchcls='free'
GROUP BY ksmchcls, ksmchidx, 20*TRUNC(ksmchsiz/20)
UNION ALL
SELECT '2 (268-523)' bucket, ksmchcls, ksmchidx, 50*TRUNC(ksmchsiz/50),
       COUNT(*), MAX(ksmchsiz),
       TRUNC(AVG(ksmchsiz)) "AvgSize", TRUNC(SUM(ksmchsiz)) "Total"
FROM x$ksmsp
WHERE ksmchsiz BETWEEN 268 AND 523
AND ksmchcls='free'
GROUP BY ksmchcls, ksmchidx, 50*TRUNC(ksmchsiz/50)
UNION ALL
SELECT '3-5 (524-4107)' bucket, ksmchcls, ksmchidx, 500*TRUNC(ksmchsiz/500),
       COUNT(*), MAX(ksmchsiz) ,
       TRUNC(AVG(ksmchsiz)) "AvgSize", TRUNC(SUM(ksmchsiz)) "Total"
FROM x$ksmsp
WHERE ksmchsiz BETWEEN 524 AND 4107
AND ksmchcls='free'
GROUP BY ksmchcls, ksmchidx, 500*TRUNC(ksmchsiz/500)
UNION ALL
SELECT '6+ (4108+)' bucket, ksmchcls, ksmchidx, 1000*TRUNC(ksmchsiz/1000),
       COUNT(*), MAX(ksmchsiz),
       TRUNC(AVG(ksmchsiz)) "AvgSize", TRUNC(SUM(ksmchsiz)) "Total"
FROM x$ksmsp
WHERE ksmchsiz >= 4108
AND ksmchcls='free'
GROUP BY ksmchcls, ksmchidx, 1000*TRUNC(ksmchsiz/1000);


f.        We can also use below query to get the output.  Again please AVOID in LIVE systems.
SELECT ksmchcls CLASS, COUNT(ksmchcls) NUM, SUM(ksmchsiz) SIZ,
TO_CHAR(((SUM(ksmchsiz)/COUNT(ksmchcls)/1024)),'999,999.00')||'k' "AVG SIZE" 
FROM X$KSMSP GROUP BY ksmchcls; 
CLASS           NUM        SIZ AVG SIZE
-------- ---------- ---------- ------------
R-free           12    8059200      655.86k
R-freea          24        960         .04k
free            331  151736448      447.67k
freeabl        4768    7514504        1.54k
perm              2   30765848   15,022.39k
recr           3577    3248864         .89k

*      if free memory (SIZ) is low (less than 5mb or so) you may need to increase the shared_pool_size and shared_pool_reserved_size.
*      if perm continually grows then it is possible you are seeing system memory leak.
*      if freeabl and recr are always huge, this indicates that you have lots of cursor info stored that is not releasing.
*      if free is huge but you are still getting 4031 errors, (you can correlate that with the reloads and invalids causing fragmentation).


Step 12:
If none of the above techniques doesn’t help to resolve the issues, we need to trace the ORA-4031 error sessions as below. This will get the snapshot of the shared pool when the error occurs.  
Modify the init.ora parameters to add the following events to get the trace file information.
Remember, we need to bounce the instance.

event="4031 tracename errorstack level 3"
event="4031 tracename HEAPDUMP level 2"

alter session set events '4031 tracename errorstack level 3';
alter session set events '4031 tracename HEAPDUMP level 2';
We can even perform at session level if we know the session of the faulty sql statement.


Good Links for ORA-4031
Metalink ID : 19837.1
Metalink ID : 146599.1
Metalink ID : 430473.1



No comments:

Post a Comment