修复 OGG 的 Time Since Chkpt

 2018-04-19    大连    晴 /oracle/2018/04/19/ogg-tsc.html oracle oracle, linux

遇到了 Oracle Golden Gate 状态显示异常为 unknown 的问题,尝试通过下面的方法修复了,记录一下。

[oracle@localhost ~]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.3 14400833 OGGCORE_11.2.1.0.3_PLATFORMS_120823.1258_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Aug 23 2012 20:20:21

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.



GGSCI (localhost.localdomain) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                          
EXTRACT     RUNNING     EXT12345     00:00:00      unknown    
EXTRACT     RUNNING     EXT67889     00:00:00      unknown    
EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
REPLICAT    RUNNING     REP12345     00:00:00      unknown    

尝试停止相关进程失败:

GGSCI (localhost.localdomain) 2> stop *

Sending STOP request to EXTRACT EXT12345 ...

ERROR: sending message to EXTRACT EXT12345 (Timeout waiting for message).

Sending STOP request to EXTRACT EXT67889 ...

ERROR: sending message to EXTRACT EXT67889 (Timeout waiting for message).

Sending STOP request to EXTRACT PUMP1234 ...

ERROR: sending message to EXTRACT PUMP1234 (Timeout waiting for message).

Sending STOP request to EXTRACT PUMP5678 ...

ERROR: sending message to EXTRACT PUMP5678 (Timeout waiting for message).

Sending STOP request to REPLICAT REP12345 ...

ERROR: sending message to REPLICAT REP12345 (Timeout waiting for message).

尝试停止 MANAGER:

GGSCI (localhost.localdomain) 3> stop mgr!

Sending STOP request to MANAGER ...
Request processed.
Manager stopped.

再次查看状态:

GGSCI (localhost.localdomain) 4> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     STOPPED                                          
EXTRACT     RUNNING     EXT12345     00:00:00      unknown    
EXTRACT     RUNNING     EXT67889     00:00:00      unknown    
EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
REPLICAT    RUNNING     REP12345     00:00:00      unknown    

目前 MANAGER 已被停止,但是 EXTRACT 和 REPLICAT 进程仍运行。

此时无法通过 kill 命令结束进程:

GGSCI (localhost.localdomain) 5> kill EXT12345

ERROR: Manager not currently running.

GGSCI (localhost.localdomain) 6> kill EXT67889

ERROR: Manager not currently running.

查看状态:

GGSCI (localhost.localdomain) 7> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     STOPPED                                          
EXTRACT     RUNNING     EXT12345     00:00:00      unknown    
EXTRACT     RUNNING     EXT67889     00:00:00      unknown    
EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
REPLICAT    RUNNING     REP12345     00:00:00      unknown  

退出 GGSCI

GGSCI (localhost.localdomain) 8> exit

查看系统级 OGG 进程:

[oracle@localhost OGG]$ ps -ef|grep /opt/OGG
oracle    7479     1  0 Nov10 ?        00:03:31 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/EXT12345.prm REPORTFILE /opt/OGG/dirrpt/EXT12345.rpt PROCESSID EXT12345 USESUBDIRS
oracle    7480     1  0 Nov10 ?        00:02:30 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/EXT67889.prm REPORTFILE /opt/OGG/dirrpt/EXT67889.rpt PROCESSID EXT67889 USESUBDIRS
oracle    7483     1  0 Nov10 ?        00:00:01 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/PUMP1234.prm REPORTFILE /opt/OGG/dirrpt/PUMP1234.rpt PROCESSID PUMP1234 USESUBDIRS
oracle    7485     1  0 Nov10 ?        00:00:03 /opt/OGG/replicat PARAMFILE /opt/OGG/dirprm/REP12345.prm REPORTFILE /opt/OGG/dirrpt/REP12345.rpt PROCESSID REP12345 USESUBDIRS
oracle    7518     1  0 Nov10 ?        00:00:01 ./server -p 7847 -k -l /opt/OGG/ggserr.log
oracle    7677     1  0 Nov10 ?        00:00:15 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/PUMP5678.prm REPORTFILE /opt/OGG/dirrpt/PUMP5678.rpt PROCESSID PUMP5678 USESUBDIRS
oracle 25261 25112 0 24:48 pts / 1     0:00:00 grip  / opt / OGG

如果以上命令查询不到,可以尝试下面的命令:

ps -ef | grep <replicat name>;

kill 相关进程:

[oracle@localhost OGG]$ kill -9 7479 7480 7482 7483 7485  7518 7677
[oracle@localhost OGG]$ ps -ef|grep /opt/OGG
oracle 25264 25112 0 24:48 pts / 1     0:00:00 grip  / opt / OGG

登录 GGSCI 查看状态:

[oracle@localhost OGG]$ ggsci

Command Interpreter Oracle GoldenGate for  Oracle
Version 11.1.1.0.0 Build 078
Linux, x64, 64bit (optimized), Oracle 10 on Jul 28 2010 13:21:11

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.



GGSCI (localhost.localdomain) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     STOPPED                                          
EXTRACT     ABENDED     EXT12345     00:00:00      unknown    
EXTRACT     ABENDED     EXT67889     00:00:00      unknown    
EXTRACT     ABENDED     PUMP1234     00:00:00      unknown    
EXTRACT     ABENDED     PUMP5678     00:00:00      unknown    
REPLICAT    ABENDED     REP12345     00:00:00      unknown    

状态变为 ABENDED,启动 MANAGER:

GGSCI (localhost.localdomain) 2> start mgr

Manager started.

再次查看状态:

GGSCI (localhost.localdomain) 3> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                          
EXTRACT     RUNNING     EXT12345     00:00:00      unknown    
EXTRACT     RUNNING     EXT67889     00:00:00      unknown    
EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
REPLICAT    RUNNING     REP12345     00:00:00      unknown

进程恢复运行状态,但是 Time Since Chkpt 值仍为 unknown。关闭进程后再次查看:

GGSCI (localhost.localdomain) 4> stop EXT12345

Sending STOP request to EXTRACT EXT12345 ...
Request processed.


GGSCI (localhost.localdomain) 5> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                          
EXTRACT     STOPPED     EXT12345     unknown       00:00:02   
EXTRACT     RUNNING     EXT67889     00:00:00      unknown    
EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
REPLICAT    RUNNING     REP12345     00:00:00      unknown    

启动进程:

GGSCI (localhost.localdomain) 6> start EXT12345

Sending START request to MANAGER ...
EXTRACT EXT12345 starting


GGSCI (localhost.localdomain) 7> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                          
EXTRACT     RUNNING     EXT12345     unknown       00:00:14   
EXTRACT     RUNNING     EXT67889     00:00:00      unknown    
EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
REPLICAT    RUNNING     REP12345     00:00:00      unknown    

Lag 状态异常,等待恢复……继续停止进程:

GGSCI (localhost.localdomain) 8> stop EXT67889

Sending STOP request to EXTRACT EXT67889 ...

STOP xxx 命令需要等待,如果需要立即停止进程,可以使用 SEND EXTRACT xxx, FORCESTOP 命令。

GGSCI (localhost.localdomain) 9> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                          
EXTRACT     RUNNING     EXT12345     unknown       00:00:02   
EXTRACT     STOPPED     EXT67889     01:51:12      00:00:01   
EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
REPLICAT    RUNNING     REP12345     00:00:00      unknown    

启动进程:


GGSCI (localhost.localdomain) 10> start EXT67889

Sending START request to MANAGER ...
EXTRACT EXT67889 starting


GGSCI (localhost.localdomain) 11> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                          
EXTRACT     RUNNING     EXT12345     99:53:02      00:00:01   
EXTRACT     RUNNING     EXT67889     01:51:12      00:00:10   
EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
REPLICAT    RUNNING     REP12345     00:00:00      00:00:00   

继续 STOPSTART 其他进程:

GGSCI (localhost.localdomain) 15> stop PUMP1234

Sending STOP request to EXTRACT PUMP1234 ...
Request processed.


GGSCI (localhost.localdomain) 16> start PUMP1234

Sending START request to MANAGER ...
EXTRACT PUMP1234 starting


GGSCI (localhost.localdomain) 17> stop PUMP5678

Sending STOP request to EXTRACT PUMP5678 ...
Request processed.


GGSCI (localhost.localdomain) 18> start PUMP5678

Sending START request to MANAGER ...
EXTRACT PUMP5678 starting


GGSCI (localhost.localdomain) 19> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                          
EXTRACT     RUNNING     EXT12345     00:00:00      00:00:01   
EXTRACT     RUNNING     EXT67889     00:00:00      00:00:10   
EXTRACT     RUNNING     PUMP1234     00:00:00      00:00:04   
EXTRACT     RUNNING     PUMP5678     00:00:00      00:00:05   
REPLICAT    RUNNING     REP12345     00:00:00      00:00:05   

一切恢复正常。

总结:

首先,强制关闭 MANAGER,然后退出 GGSCIkill OGG 相关进程,最后,再次进入 GGSCI 并启动 MANAGER,重启相关异常进程。

关于作者
Jason,80 后,现从事通信行业。安卓玩家一个人的书房朗读者麦子
 英语入门到放弃
 jsntn
 jasonwtien
 jasonwtien
更多…… /about.html

最近更新: