汉字点击验证码识别

2023-03-27 14:44| 来源: 网络整理| 查看: 265

在开始之前应该配置好selenium+firefox

一、得到验证码图片

只有当鼠标移动到验证码上时，图片才会显示出来。因此我们可以采用鼠标悬停的的方式显示验证码，并得到图片保存在本地。

鼠标悬停拥有两种方式，一种是悬停到具体坐标，另一种是悬停到某一具体元素。我们选择前一种，原因有两点：由于最后识别出验证码时，需要点击具体坐标（每个字并不具有相对应的元素）。并且每一次鼠标执行都会在上一次的移动的坐标下再次移动(因此需要元素的坐标)。

from selenium import webdriver import time from selenium.webdriver.common.action_chains import ActionChains driver=webdriver.Firefox() driver.get('http://www.xxx.com') time.sleep(5) #ActionChains(driver).move_to_element(ele).perform() #元素悬停 ActionChains(driver).move_by_offset(762.5,447.76).perform() #具体坐标悬停，使验证码出现 img=driver.find_element_by_xpath('//*[@id="jcaptchaimage"]') #选择验证码位置 img.screenshot('1.png')#验证码截图并保存为'1.png'

二、验证码识别

字体验证码采用腾讯云字体识别（传输的图片信息必须是图片url或者base64编码）,返回识别字体及字体位置。字体位置返回的是相对图片验证码的字体四个顶点的位置，可以设置函数，得到字体中心距离坐标（相对验证码左上角）。

具体可看官方文档：https://cloud.tencent.com/product/ocr/developer

#需要安装相关库(pip install tencentcloud-sdk-python) from tencentcloud.common import credential from tencentcloud.common.profile.client_profile import ClientProfile from tencentcloud.common.profile.http_profile import HttpProfile from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException from tencentcloud.ocr.v20181119 import ocr_client, models import base64 #读取图片，并进行base64编码，返回的图片信息 def tencent_OCR(): ''' 输出：验证码识别字体及位置信息 ''' #读取存储的图片，并转化为base64 with open('1.png','rb') as f: base64_data=base64.b64encode(f.read()) s=base64_data.decode() try: #输入腾讯云密钥的SecretId和SecretKey cred = credential.Credential("SecretId", "SecretKey") httpProfile = HttpProfile() httpProfile.endpoint = "ocr.tencentcloudapi.com" clientProfile = ClientProfile() clientProfile.httpProfile = httpProfile client = ocr_client.OcrClient(cred, "ap-beijing", clientProfile) req = models.GeneralBasicOCRRequest() params = '{"ImageBase64":"'+s+'"}' #输入图片的base64编码 req.from_json_string(params) resp = client.GeneralBasicOCR(req) res=resp.to_json_string() #图片信息转化为字符串 #print(type(res)) #print(res) return (res) except TencentCloudSDKException as err: print('验证码识别错误',err) return('error') #计算一个中心坐标(中间函数) def cen_location(coo): ''' 输入:每个字四个顶点的坐标输出：每个字中心坐标 ''' x=0 y=0 for i in coo: x=x+i['X'] y=y+i['Y'] return (x/(2*2),y/(2*2)) #计算所有识别字体的中心坐标 def texts_and_locations(img_str): ''' 输入：验证码返回信息输出：识别字体列表及其坐标列表 ''' img_str=json.loads(img_str) TextDetections=img_str['TextDetections'] texts=[] #所有识别出来的字体 cen_locations=[] #每个字体相对验证码的中心位置 for t_d in TextDetections: texts.append(t_d['DetectedText']) x,y=cen_location(t_d['Polygon']) cen_locations.append([x,y]) return (texts,cen_locations)

百度云智能云也可进行字体识别：

详细可看官方文档：https://cloud.baidu.com/doc/OCR/OCR-Python-SDK.html

三、位置处理

由于每次坐标移动都是在上一次坐标的基础上进行移动，因此可以设置函数在每次移动过后，再相反移动一次，恢复原来坐标。然后将步骤二得到的中心位置进行点击。注意：由于中心坐标是相对于验证码左上角的位置，因此进行识别字体点击前必须先将坐标移动到验证码左上角（验证码左上角相对于页面的坐标）

#每个字体点击 def mo_click(driver,xy,left_click=True): ''' 输入：driver,每个字的坐标输出：位置点击 ''' if left_click: ActionChains(driver).move_by_offset(xy[0],xy[1]).click().perform() #左键点击 else: ActionChains(driver).move_by_offset(xy[0],xy[1]).context_click().perform() #右键点击 ActionChains(driver).move_by_offset(-xy[0],-xy[1]).perform() #恢复鼠标位置 def simulation_clicks(driver,by_cen_locations): ''' 输入：需要点击的三个字的坐标列表输出：验证码字体点击 ''' for xy in by_cen_locations: mo_click(driver,xy)

【本文地址】

公司简介

联系我们