Salesforce 数据清洗

时间:2023-03-10 01:53:07
Salesforce 数据清洗

新系统上线后,需要导入历史数据,但是旧数据格式,数据缺失,数据错误,奇异值,属性归类与新系统有很大的gap。因此我们需要建立一套数据动态清洗规则给Salesforce系统,通过这些规则自动清洗导入数据,清洗规则可以让function自己配置。而不需要IT负责

下面将详细举一个例子如何在salesforce中做数据处理。数据清洗需要分成5个步骤

1,建立2个关联数据的Object的和 一个数据清洗后台设置的Object的
2,数据导入页面csv
3,定义每个字段的范围、属性,如果是错误的则自动重新分配,或者修改成临近值
4,数据清洗合并。
5,导出错误数据到Excel
第一步,新建立两个关联的Recruit 和 Recruit Department, 并且建立一个清洗规则的Object,当导入数据后我们可以读取设置的清洗规则,并对导入的数据进行清洗
Salesforce 数据清洗
Salesforce 数据清洗
Salesforce 数据清洗
第二步,对于清洗规则,我们只能有一条规则被激活,因此我们在插入新规则和更改旧规则的时候,我们需要添加一个tirgger针对Data_Washing_Setting,保证规则的唯一性。
 trigger IsActiveChecking on Data_Washing_Setting__c (before insert,before update) {

     List<Data_Washing_Setting__c> ListOldData =[select Id from Data_Washing_Setting__c
where Active_this_Rule__c = true];
List<Data_Washing_Setting__c> ListNewData =trigger.new; //system.debug('ListNewData:'+ListNewData.size());
integer itemNum = 0;
if(trigger.isInsert)
{
if(trigger.isBefore)
{
for(Data_Washing_Setting__c dws : trigger.new)
{
if(dws.Active_this_Rule__c)
{
itemNum++;
}
}
itemNum +=ListOldData.size(); if(itemNum>1)
{
for(Data_Washing_Setting__c dws : trigger.new){
dws.adderror('only one record can be actived! pls check your history data and try again.');
}
}
}
}
else if(trigger.isUpdate)
{
if(trigger.isBefore)
{
// 去掉更新的数据
for(Data_Washing_Setting__c dws : trigger.new)
{
for(integer i=0;i<ListOldData.size();i++){
if(dws.Id== ListOldData[i].Id)
{
ListOldData.remove(i);
}
}
if(dws.Active_this_Rule__c)
{
itemNum++;
}
}
itemNum +=ListOldData.size();
if(itemNum>1)
{
for(Data_Washing_Setting__c dws : trigger.new){
dws.adderror('only one record can be actived! pls check your history data and try again.');
}
} }
}
}

第三步,我们需要建立导入页面,并添加相应的验证按钮

Salesforce 数据清洗

VF的代码

 <apex:page controller="BatchInsertByCsvController">
<apex:form >
<apex:sectionHeader title="Upload Recruit Data"/>
<apex:pageMessages />
<apex:pageblock >
<center>
<apex:inputFile value="{!contentFile}" fileName="{!fileName}" />
<apex:commandButton action="{!LoadData}" value="Batch Insert"/>
<apex:commandButton action="{!LoadBlankList}" value="Filter Blank Data"/>
<apex:commandButton action="{!ExportBlankToCSV}" value="Export CSV"/> </center>
</apex:pageblock>
<apex:pageBlock title="Import Data">
<apex:pageblocktable value="{!RecruitList}" var="ReList">
<apex:column value="{!ReList.Name}" />
<apex:column value="{!ReList.Position_Name__c}" />
<apex:column value="{!ReList.Recruit_Department__c}" />
<apex:column value="{!ReList.Recruit_Type__c}" />
<apex:column value="{!ReList.Recruit_Number__c}" />
</apex:pageblocktable>
</apex:pageBlock>
<apex:pageBlock title="Blank Data">
<apex:pageblocktable value="{!BlankList}" var="BList">
<apex:column value="{!BList.Name}" />
<apex:column value="{!BList.Position_Name__c}" />
<apex:column value="{!BList.Recruit_Department__c}" />
<apex:column value="{!BList.Recruit_Type__c}" />
<apex:column value="{!BList.Recruit_Number__c}" />
</apex:pageblocktable>
</apex:pageBlock>
</apex:form>
</apex:page>

后台APEX 导入代码

 public class BatchInsertByCsvController {

     public string fileName{get;set;}
//Blob:二进制对象类型。通过inputFile选中后的文件在后台获取的时候是一个Blob类型,
public Blob contentFile{get;set;}
public String[] filelines = new String[]{};
public List<Recruit__c> RecruitList{get;set;}
public List<Recruit__c> BlankList{get;set;}
public List<Recruit__c> invaildList{get;set;}
//初始化
public PageReference LoadData()
{
try{
filename = bitToString(contentFile,'ISO-8859-1');
filelines = fileName.split('\n');
// ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, 'import account:'+filelines.size());
// ApexPages.addMessage(msgs);
RecruitList = new List<Recruit__c>();
string[] inputvalues;
string SwpNumber; for(Integer i=1;i<filelines.size();i++)
{
inputvalues = new string[]{};
inputvalues = filelines[i].split(',');
Recruit__c recruits = new Recruit__c();
recruits.Name = inputvalues[0];
recruits.Position_Name__c = inputvalues[1];
recruits.Recruit_Department__c = [SELECT Id
FROM Recruit_Department__c
WHERE Name =:inputvalues[2] LIMIT 1].Id;
recruits.Recruit_Type__c = inputvalues[3];
SwpNumber = inputvalues[4];
recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim());
RecruitList.add(recruits);
}
}
catch(exception e){
ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,'An error has occured reading the CSV file: '+e.getMessage());
ApexPages.addMessage(errormsg);
}
try{
// insert RecruitList;
// ApexPages.Message successMsg = new ApexPages.Message(ApexPages.severity.INFO,'import success');
// ApexPages.addMessage(successMsg);
}
catch(Exception e)
{
//ApexPages.Message errormsg = new ApexPages.Message(ApexPages.severity.ERROR,'An error has occured inserting the records'+e.getMessage());
//ApexPages.addMessage(errormsg);
}
return null;
}
//blob是二进制存储的,String是16进制存储的,所以使用此种方式加上编码解码等操作肯定会更加适应,包括中文
private String bitToString(Blob input, String inCharset){
//转换成16进制
String hex = EncodingUtil.convertToHex(input);
//一个String类型两个字节 32位(bit),则一个String长度应该为两个16进制的长度,所以此处向右平移一个单位,即除以2
//向右平移一个单位在正数情况下等同于除以2,负数情况下不等
//eg 9 00001001 >>1 00000100 结果为4
final Integer bytesCount = hex.length() >> 1;
//声明String数组,长度为16进制转换成字符串的长度
String[] bytes = new String[bytesCount];
for(Integer i = 0; i < bytesCount; ++i) {
//将相邻两位的16进制字符串放在一个String中
bytes[i] = hex.mid(i << 1, 2);
}
//解码成指定charset的字符串
return EncodingUtil.urlDecode('%' + String.join(bytes, '%'), inCharset);
}
//筛选空值
public PageReference LoadBlankList()
{
try
{
BlankList=new list<Recruit__c>();
DataWashingSetting dws=new DataWashingSetting();
string[] flines = dws.AddQuestionsData(filelines);
string[] inputvalues;
string SwpNumber; for(Integer i=0;i<flines.size();i++)
{
inputvalues = new string[]{};
inputvalues = flines[i].split(',');
Recruit__c recruits = new Recruit__c();
recruits.Name = inputvalues[0];
recruits.Position_Name__c = inputvalues[1];
recruits.Recruit_Department__c = [SELECT Id
FROM Recruit_Department__c
WHERE Name =:inputvalues[2] LIMIT 1].Id;
recruits.Recruit_Type__c = inputvalues[3];
SwpNumber = inputvalues[4];
recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim());
BlankList.add(recruits);
}
ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, 'blank num:'+BlankList.size());
ApexPages.addMessage(msgs);
}
catch(Exception e)
{
ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,'An error has occured reading the CSV file: '+e.getMessage());
ApexPages.addMessage(errormsg);
}
return null;
}
public PageReference ExportBlankToCSV()
{
return new PageReference('/apex/ExportCSV');
}
}

后台调用的验证清洗代码,可以根据需要任意添加

 public class DataWashingSetting {

     //消除重复数据
public List<Recruit__c> DelDuplicateData(List<Recruit__c> OriginalList)
{
set<Recruit__c> myset= new set<Recruit__c>();
List<Recruit__c> result = new List<Recruit__c>(); myset.addAll(OriginalList);
result.addAll(myset); return result;
}
//筛选为空数据
public string[] AddQuestionsData(string[] filelines)
{
string[] result =new string[]{};
string[] inputvalues;
for(Integer i=1;i<filelines.size();i++)
{
inputvalues = new string[]{};
inputvalues = filelines[i].split(',');
if(inputvalues[0] == ''||inputvalues[1] == '' ||inputvalues[2] == ''
||inputvalues[3] == '' ||inputvalues[4] == '')
{
result.add(filelines[i]);
}
}
return result;
}
//检测各个字段的合理性
public string[] CheckFiled(string[] filelines)
{
//读取规则
Data_Washing_Setting__c dws = [select Position_Name_Rule__c,
Recruit_End_Number__c,Recruit_Department_Rule__c,Recruit_Start_Number__c from Data_Washing_Setting__c where Active_this_Rule__c = true];
string PositionNameRule = dws.Position_Name_Rule__c; //部门规则是否允许重复
decimal startNumber= dws.Recruit_Start_Number__c; //招聘人数底线
decimal endNumber= dws.Recruit_End_Number__c; //招聘人数上线
string department = dws.Recruit_Department_Rule__c;//部门限制 string[] result =new string[]{};
string[] inputvalues;
for(Integer i=1;i<filelines.size();i++)
{
inputvalues = new string[]{};
inputvalues = filelines[i].split(',');
//填写验证代码
}
return result; //返回不合格代码
}
}

出现问题数据直接导出问题数据到Excel,手动处理后再导入。

 <apex:page controller="BatchInsertByCsvController" cache="true" contentType="application/x-excel# BlankList.xls" showHeader="false">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
</head>
<apex:pageBlock >
<apex:pageblocktable value="{!BlankList}" var="BList">
<apex:column value="{!BList.Name}" />
<apex:column value="{!BList.Position_Name__c}" />
<apex:column value="{!BList.Recruit_Department__c}" />
<apex:column value="{!BList.Recruit_Type__c}" />
<apex:column value="{!BList.Recruit_Number__c}" />
</apex:pageblocktable>
</apex:pageBlock>
</apex:page>

下面就是最终效果:

1,导入数据,自动筛选有缺失值的数据,并支持Excel导出

Salesforce 数据清洗

2,后台清洗的规则设置。

Salesforce 数据清洗