修复 OGG 的 Time Since Chkpt
2018-04-19
大连
晴
/oracle/2018/04/19/ogg-tsc.html
oracle
遇到了 Oracle Golden Gate 状态显示异常为 unknown 的问题,尝试通过下面的方法修复了,记录一下。
[oracle@localhost ~]$ ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.3 14400833 OGGCORE_11.2.1.0.3_PLATFORMS_120823.1258_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Aug 23 2012 20:20:21
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.
GGSCI (localhost.localdomain) 1> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT12345 00:00:00 unknown
EXTRACT RUNNING EXT67889 00:00:00 unknown
EXTRACT RUNNING PUMP1234 00:00:00 unknown
EXTRACT RUNNING PUMP5678 00:00:00 unknown
REPLICAT RUNNING REP12345 00:00:00 unknown
尝试停止相关进程失败:
GGSCI (localhost.localdomain) 2> stop *
Sending STOP request to EXTRACT EXT12345 ...
ERROR: sending message to EXTRACT EXT12345 (Timeout waiting for message).
Sending STOP request to EXTRACT EXT67889 ...
ERROR: sending message to EXTRACT EXT67889 (Timeout waiting for message).
Sending STOP request to EXTRACT PUMP1234 ...
ERROR: sending message to EXTRACT PUMP1234 (Timeout waiting for message).
Sending STOP request to EXTRACT PUMP5678 ...
ERROR: sending message to EXTRACT PUMP5678 (Timeout waiting for message).
Sending STOP request to REPLICAT REP12345 ...
ERROR: sending message to REPLICAT REP12345 (Timeout waiting for message).
尝试停止 MANAGER:
GGSCI (localhost.localdomain) 3> stop mgr!
Sending STOP request to MANAGER ...
Request processed.
Manager stopped.
再次查看状态:
GGSCI (localhost.localdomain) 4> info all
Program Status Group Lag Time Since Chkpt
MANAGER STOPPED
EXTRACT RUNNING EXT12345 00:00:00 unknown
EXTRACT RUNNING EXT67889 00:00:00 unknown
EXTRACT RUNNING PUMP1234 00:00:00 unknown
EXTRACT RUNNING PUMP5678 00:00:00 unknown
REPLICAT RUNNING REP12345 00:00:00 unknown
目前 MANAGER 已被停止,但是 EXTRACT 和 REPLICAT 进程仍运行。
此时无法通过 kill
命令结束进程:
GGSCI (localhost.localdomain) 5> kill EXT12345
ERROR: Manager not currently running.
GGSCI (localhost.localdomain) 6> kill EXT67889
ERROR: Manager not currently running.
查看状态:
GGSCI (localhost.localdomain) 7> info all
Program Status Group Lag Time Since Chkpt
MANAGER STOPPED
EXTRACT RUNNING EXT12345 00:00:00 unknown
EXTRACT RUNNING EXT67889 00:00:00 unknown
EXTRACT RUNNING PUMP1234 00:00:00 unknown
EXTRACT RUNNING PUMP5678 00:00:00 unknown
REPLICAT RUNNING REP12345 00:00:00 unknown
退出 GGSCI
:
GGSCI (localhost.localdomain) 8> exit
查看系统级 OGG 进程:
[oracle@localhost OGG]$ ps -ef|grep /opt/OGG
oracle 7479 1 0 Nov10 ? 00:03:31 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/EXT12345.prm REPORTFILE /opt/OGG/dirrpt/EXT12345.rpt PROCESSID EXT12345 USESUBDIRS
oracle 7480 1 0 Nov10 ? 00:02:30 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/EXT67889.prm REPORTFILE /opt/OGG/dirrpt/EXT67889.rpt PROCESSID EXT67889 USESUBDIRS
oracle 7483 1 0 Nov10 ? 00:00:01 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/PUMP1234.prm REPORTFILE /opt/OGG/dirrpt/PUMP1234.rpt PROCESSID PUMP1234 USESUBDIRS
oracle 7485 1 0 Nov10 ? 00:00:03 /opt/OGG/replicat PARAMFILE /opt/OGG/dirprm/REP12345.prm REPORTFILE /opt/OGG/dirrpt/REP12345.rpt PROCESSID REP12345 USESUBDIRS
oracle 7518 1 0 Nov10 ? 00:00:01 ./server -p 7847 -k -l /opt/OGG/ggserr.log
oracle 7677 1 0 Nov10 ? 00:00:15 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/PUMP5678.prm REPORTFILE /opt/OGG/dirrpt/PUMP5678.rpt PROCESSID PUMP5678 USESUBDIRS
oracle 25261 25112 0 24:48 pts / 1 0:00:00 grip / opt / OGG
如果以上命令查询不到,可以尝试下面的命令:
ps -ef | grep <replicat name>;
kill
相关进程:
[oracle@localhost OGG]$ kill -9 7479 7480 7482 7483 7485 7518 7677
[oracle@localhost OGG]$ ps -ef|grep /opt/OGG
oracle 25264 25112 0 24:48 pts / 1 0:00:00 grip / opt / OGG
登录 GGSCI
查看状态:
[oracle@localhost OGG]$ ggsci
Command Interpreter Oracle GoldenGate for Oracle
Version 11.1.1.0.0 Build 078
Linux, x64, 64bit (optimized), Oracle 10 on Jul 28 2010 13:21:11
Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.
GGSCI (localhost.localdomain) 1> info all
Program Status Group Lag Time Since Chkpt
MANAGER STOPPED
EXTRACT ABENDED EXT12345 00:00:00 unknown
EXTRACT ABENDED EXT67889 00:00:00 unknown
EXTRACT ABENDED PUMP1234 00:00:00 unknown
EXTRACT ABENDED PUMP5678 00:00:00 unknown
REPLICAT ABENDED REP12345 00:00:00 unknown
状态变为 ABENDED,启动 MANAGER:
GGSCI (localhost.localdomain) 2> start mgr
Manager started.
再次查看状态:
GGSCI (localhost.localdomain) 3> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT12345 00:00:00 unknown
EXTRACT RUNNING EXT67889 00:00:00 unknown
EXTRACT RUNNING PUMP1234 00:00:00 unknown
EXTRACT RUNNING PUMP5678 00:00:00 unknown
REPLICAT RUNNING REP12345 00:00:00 unknown
进程恢复运行状态,但是 Time Since Chkpt
值仍为 unknown。关闭进程后再次查看:
GGSCI (localhost.localdomain) 4> stop EXT12345
Sending STOP request to EXTRACT EXT12345 ...
Request processed.
GGSCI (localhost.localdomain) 5> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXT12345 unknown 00:00:02
EXTRACT RUNNING EXT67889 00:00:00 unknown
EXTRACT RUNNING PUMP1234 00:00:00 unknown
EXTRACT RUNNING PUMP5678 00:00:00 unknown
REPLICAT RUNNING REP12345 00:00:00 unknown
启动进程:
GGSCI (localhost.localdomain) 6> start EXT12345
Sending START request to MANAGER ...
EXTRACT EXT12345 starting
GGSCI (localhost.localdomain) 7> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT12345 unknown 00:00:14
EXTRACT RUNNING EXT67889 00:00:00 unknown
EXTRACT RUNNING PUMP1234 00:00:00 unknown
EXTRACT RUNNING PUMP5678 00:00:00 unknown
REPLICAT RUNNING REP12345 00:00:00 unknown
Lag 状态异常,等待恢复……继续停止进程:
GGSCI (localhost.localdomain) 8> stop EXT67889
Sending STOP request to EXTRACT EXT67889 ...
STOP xxx
命令需要等待,如果需要立即停止进程,可以使用 SEND EXTRACT xxx, FORCESTOP
命令。
GGSCI (localhost.localdomain) 9> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT12345 unknown 00:00:02
EXTRACT STOPPED EXT67889 01:51:12 00:00:01
EXTRACT RUNNING PUMP1234 00:00:00 unknown
EXTRACT RUNNING PUMP5678 00:00:00 unknown
REPLICAT RUNNING REP12345 00:00:00 unknown
启动进程:
GGSCI (localhost.localdomain) 10> start EXT67889
Sending START request to MANAGER ...
EXTRACT EXT67889 starting
GGSCI (localhost.localdomain) 11> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT12345 99:53:02 00:00:01
EXTRACT RUNNING EXT67889 01:51:12 00:00:10
EXTRACT RUNNING PUMP1234 00:00:00 unknown
EXTRACT RUNNING PUMP5678 00:00:00 unknown
REPLICAT RUNNING REP12345 00:00:00 00:00:00
继续 STOP
和 START
其他进程:
GGSCI (localhost.localdomain) 15> stop PUMP1234
Sending STOP request to EXTRACT PUMP1234 ...
Request processed.
GGSCI (localhost.localdomain) 16> start PUMP1234
Sending START request to MANAGER ...
EXTRACT PUMP1234 starting
GGSCI (localhost.localdomain) 17> stop PUMP5678
Sending STOP request to EXTRACT PUMP5678 ...
Request processed.
GGSCI (localhost.localdomain) 18> start PUMP5678
Sending START request to MANAGER ...
EXTRACT PUMP5678 starting
GGSCI (localhost.localdomain) 19> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT12345 00:00:00 00:00:01
EXTRACT RUNNING EXT67889 00:00:00 00:00:10
EXTRACT RUNNING PUMP1234 00:00:00 00:00:04
EXTRACT RUNNING PUMP5678 00:00:00 00:00:05
REPLICAT RUNNING REP12345 00:00:00 00:00:05
一切恢复正常。
总结:
首先,强制关闭 MANAGER,然后退出 GGSCI
,kill
OGG 相关进程,最后,再次进入 GGSCI
并启动 MANAGER,重启相关异常进程。
关于作者
最近更新: