前文回顾:
《DataX 及 DataX-Web 安装使用详解》
《DataX 源码调试及打包》
《DataX-Web 源码调试及打包》
目前很多主流数据库支持 on duplicate key update(当主键冲突update数据)模式,DataX 也支持通过配置 writeMode 来配置写入模式。但是目前Oracle只支持insert配置项。
如何适配 Oracle 数据库 on duplicate key update 模式,今天大佬超就来带你进行二次开发。
writeMode 的 insert、replace、update 配置项底层采用 INSERT INTOREPLACE INTO/INSERT INTO … ON DUPLICATE KEY UPDATE
语句:
其中 insert into 当主键/唯一性索引冲突时会写不进去冲突的行;后两者没有遇到主键/唯一性索引冲突时与 insert into 行为一致,遇到冲突时会用新行替换原有行所有字段。
oracle 不支持类似 MySQL的 REPLACE INTO
和 INSERT … ON DUPLICATE KEY UPDATE
,所以只支持 insert 配置项。要实现此功能,需要利用 Oracle 的 merge 语句,先来看下 merge 语法。
MERGE INTO [target-table] A USING [source-table sql] B
ON([conditional expression] and [...]...)
WHEN MATCHED THEN[UPDATE sql]
WHEN NOT MATCHED THEN [INSERT sql]
merge 语法其实就是存在就更新,不存在就插入。
示例:
MERGE INTO USERS A USING ( SELECT 18 AS "ID",'chaodev' AS "USER_ID" FROM DUAL ) TMP
ON (TMP."ID" = A."ID" AND TMP."USER_ID" = A."USER_ID" )
WHEN MATCHED THEN
UPDATE SET "USER_NAME" = '大佬超',"USER_PHONE" = '18000000000',"LASTUPDATETIME" = SYSDATE
WHEN NOT MATCHED THEN
INSERT ("ID","USER_ID","USER_NAME","USER_PHONE","LASTUPDATETIME") VALUES(18,'chaodev','大佬超','18000000000',SYSDATE)
所以最终实现原理就是:更改datax 的 oraclewriter 源码,通过 merge into 语句,实现 UPSERT 语义。
涉及修改的类和方法如下:
oraclewriter包:
com.alibaba.datax.plugin.writer.oraclewriter.OracleWriter
:修改允许用户配置 writeMode。
plugin-dbms-util包:
com.alibaba.datax.plugin.rdbms.writer.util.WriterUtil
:增加oralce的逻辑代码。
com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter
:CommonRdbmsWriter.Task类替换 startWriteWithConnection() 、doBatchInsert() 和fillPreparedStatement() 方法。
增加oracle的逻辑代码,如下
/**
* 新增oracle update模块
* @author 程序员大佬超
* @date 20221202
* @param columnHolders
* @param valueHolders
* @param writeMode
* @param dataBaseType
* @param forceUseUpdate
* @return
*/
public static String getWriteTemplate(List columnHolders, List valueHolders,String writeMode, DataBaseType dataBaseType, boolean forceUseUpdate)
{String mode = writeMode.trim().toLowerCase();boolean isWriteModeLegal = mode.startsWith("insert") || mode.startsWith("replace") || mode.startsWith("update");if (!isWriteModeLegal) {throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE,String.format("您所配置的 writeMode:%s 错误. 因为DataX 目前仅支持replace,update 或 insert 方式. 请检查您的配置并作出修改.", writeMode));}String writeDataSqlTemplate;if (forceUseUpdate || mode.startsWith("update")) {if (dataBaseType == DataBaseType.MySql || dataBaseType == DataBaseType.Tddl) {writeDataSqlTemplate = new StringBuilder().append("INSERT INTO %s (").append(StringUtils.join(columnHolders, ",")).append(") VALUES(").append(StringUtils.join(valueHolders, ",")).append(")").append(onDuplicateKeyUpdateString(columnHolders)).toString();}else if (dataBaseType == DataBaseType.Oracle) {writeDataSqlTemplate = new StringBuilder().append(onMergeIntoDoString(writeMode, columnHolders, valueHolders)).append("INSERT (").append(StringUtils.join(columnHolders, ",")).append(") VALUES(").append(StringUtils.join(valueHolders, ",")).append(")").toString();}else {throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE,String.format("当前数据库不支持 writeMode:%s 模式.", writeMode));}}else {//这里是保护,如果其他错误的使用了update,需要更换为replaceif (writeMode.trim().toLowerCase().startsWith("update")) {writeMode = "replace";}writeDataSqlTemplate = new StringBuilder().append(writeMode).append(" INTO %s (").append(StringUtils.join(columnHolders, ",")).append(") VALUES(").append(StringUtils.join(valueHolders, ",")).append(")").toString();}return writeDataSqlTemplate;
}
调用的方法新增如下,主要就是拼接merge语句:
public static String onMergeIntoDoString(String merge, List columnHolders, List valueHolders) {String[] sArray = getStrings(merge);StringBuilder sb = new StringBuilder();sb.append("MERGE INTO %s A USING ( SELECT ");boolean first = true;boolean first1 = true;StringBuilder str = new StringBuilder();StringBuilder update = new StringBuilder();for (String columnHolder : columnHolders) {if (Arrays.asList(sArray).contains(columnHolder)) {if (!first) {sb.append(",");str.append(" AND ");} else {first = false;}str.append("TMP.").append(columnHolder);sb.append("?");str.append(" = ");sb.append(" AS ");str.append("A.").append(columnHolder);sb.append(columnHolder);}}for (String columnHolder : columnHolders) {if (!Arrays.asList(sArray).contains(columnHolder)) {if (!first1) {update.append(",");} else {first1 = false;}update.append(columnHolder);update.append(" = ");update.append("?");}}sb.append(" FROM DUAL ) TMP ON (");sb.append(str);sb.append(" ) WHEN MATCHED THEN UPDATE SET ");sb.append(update);sb.append(" WHEN NOT MATCHED THEN ");return sb.toString();
}public static String[] getStrings(String merge) {merge = merge.replace("update", "");merge = merge.replace("(", "");merge = merge.replace(")", "");merge = merge.replace(" ", "");return merge.split(",");
}
修改 startWriteWithConnection() 方法
/**
* 更改适配oracle update
* @author 程序员大佬超
* @date 20221202
* @param recordReceiver
* @param taskPluginCollector
* @param connection
*/
public void startWriteWithConnection(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector, Connection connection)
{this.taskPluginCollector = taskPluginCollector;List mergeColumns = new ArrayList<>();if (this.dataBaseType == DataBaseType.Oracle && !"insert".equalsIgnoreCase(this.writeMode)) {LOG.info("write oracle using {} mode", this.writeMode);List columnsOne = new ArrayList<>();List columnsTwo = new ArrayList<>();String merge = this.writeMode;String[] sArray = WriterUtil.getStrings(merge);for (String s : this.columns) {if (Arrays.asList(sArray).contains(s)) {columnsOne.add(s);}}for (String s : this.columns) {if (!Arrays.asList(sArray).contains(s)) {columnsTwo.add(s);}}int i = 0;for (String column : columnsOne) {mergeColumns.add(i++, column);}for (String column : columnsTwo) {mergeColumns.add(i++, column);}}mergeColumns.addAll(this.columns);// 用于写入数据的时候的类型根据目的表字段类型转换this.resultSetMetaData = DBUtil.getColumnMetaData(connection,this.table, StringUtils.join(mergeColumns, ","));// 写数据库的SQL语句calcWriteRecordSql();List writeBuffer = new ArrayList<>(this.batchSize);int bufferBytes = 0;try {Record record;while ((record = recordReceiver.getFromReader()) != null) {if (record.getColumnNumber() != this.columnNumber) {// 源头读取字段列数与目的表字段写入列数不相等,直接报错throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR,String.format("列配置信息有错误. 因为您配置的任务中,源头读取字段数:%s 与 目的表要写入的字段数:%s 不相等. 请检查您的配置并作出修改.",record.getColumnNumber(),this.columnNumber));}writeBuffer.add(record);bufferBytes += record.getMemorySize();if (writeBuffer.size() >= batchSize || bufferBytes >= batchByteSize) {doBatchInsert(connection, writeBuffer);writeBuffer.clear();bufferBytes = 0;}}if (!writeBuffer.isEmpty()) {doBatchInsert(connection, writeBuffer);writeBuffer.clear();}}catch (Exception e) {throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e);}finally {writeBuffer.clear();DBUtil.closeDBResources(null, null, connection);}
}
修改 doBatchInsert() 方法
/**
* 更改适配oracle update
* @author 程序员大佬超
* @date 20221202
* @param connection
* @param buffer
* @throws SQLException
*/
protected void doBatchInsert(Connection connection, List buffer)throws SQLException
{PreparedStatement preparedStatement = null;try {connection.setAutoCommit(false);preparedStatement = connection.prepareStatement(this.writeRecordSql);if (this.dataBaseType == DataBaseType.Oracle && !"insert".equalsIgnoreCase(this.writeMode)) {String merge = this.writeMode;String[] sArray = WriterUtil.getStrings(merge);for (Record record : buffer) {List recordOne = new ArrayList<>();for (int j = 0; j < this.columns.size(); j++) {if (Arrays.asList(sArray).contains(this.columns.get(j))) {recordOne.add(record.getColumn(j));}}for (int j = 0; j < this.columns.size(); j++) {if (!Arrays.asList(sArray).contains(this.columns.get(j))) {recordOne.add(record.getColumn(j));}}for (int j = 0; j < this.columns.size(); j++) {recordOne.add(record.getColumn(j));}for (int j = 0; j < recordOne.size(); j++) {record.setColumn(j, recordOne.get(j));}preparedStatement = fillPreparedStatement(preparedStatement, record);preparedStatement.addBatch();}}else {for (Record record : buffer) {preparedStatement = fillPreparedStatement(preparedStatement, record);preparedStatement.addBatch();}}preparedStatement.executeBatch();connection.commit();}catch (SQLException e) {LOG.warn("回滚此次写入, 采用每次写入一行方式提交. 因为: {}", e.getMessage());connection.rollback();doOneInsert(connection, buffer);}catch (Exception e) {throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e);}finally {DBUtil.closeDBResources(preparedStatement, null);}
}
修改 fillPreparedStatement() 方法
/**
* 更改适配oracle update
* @author 程序员大佬超
* @date 20221202
* @param preparedStatement
* @param record
* @return
* @throws SQLException
*/
protected PreparedStatement fillPreparedStatement(PreparedStatement preparedStatement, Record record)throws SQLException
{for (int i = 0; i < record.getColumnNumber(); i++) {int columnSqltype = this.resultSetMetaData.getMiddle().get(i);preparedStatement = fillPreparedStatementColumnType(preparedStatement, i,columnSqltype, record.getColumn(i));}return preparedStatement;
}
重新打包后,测试job
{"job": {"setting": {"speed": {"channel": 3,"byte": 1048576},"errorLimit": {"record": 0,"percentage": 0.02}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "123456","column": ["`id`","`user_id`","`user_password`","`user_name`","`user_phone`","`email`","`nick_name`","`head_url`","`sex`","`state`","`create_time`","`create_user`","`lastUpdateTime`"],"splitPk": "","connection": [{"table": ["users"],"jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/im"]}]}},"writer": {"name": "oraclewriter","parameter": {"username": "yxc","password": "123456","column": ["\"ID\"","\"USER_ID\"","\"USER_PASSWORD\"","\"USER_NAME\"","\"USER_PHONE\"","\"EMAIL\"","\"NICK_NAME\"","\"HEAD_URL\"","\"SEX\"","\"STATE\"","\"CREATE_TIME\"","\"CREATE_USER\"","\"LASTUPDATETIME\""],"writeMode": "update(\"ID\",\"USER_ID\")","connection": [{"table": ["USERS"],"jdbcUrl": "jdbc:oracle:thin:@//192.168.157.142:1521/orcl"}]}}}]}
}
注:writeMode的update括号里字段要加上\"
查看运行日志,可以看到已经正确拼接了MERGE语句:
任务执行成功。切换 insert 和 update 模式检验,无异常。
更多技术干货,请持续关注程序员大佬超。
原创不易,转载请注明出处。