创作人 Leo
编辑时间 Wed Jan 1,2020 at 10:13
正则表达式使用 re 模块
构建模式对象
p = re.compile(p, re.IGNORECASE|re.MULTILINE)
构建一个pattern对象,并指定模式忽略大小写和开启多行模式
match 匹配
匹配即为模式串必须完全匹配,
search 搜索
指在目标字符串中搜索符合这个正则表达式的子串(与php grep_match 相同)
例:
# -*- coding: utf-8 -*-
'''
正则表达式
'''
import re
target_str = '<img src="http://maps.google.cn/maps/api/staticmap?zoom=12&size=270x180&markers=icon:http://static.qyer.com/images/place5/icon_mapno_big.png|35.715038,139.796799&sensor=false">'
reg_str = "http://maps.google.cn.*|(.*?),(.*?)&sensor=false"
pattern = re.compile(reg_str, re.IGNORECASE|re.MULTILINE)
print pattern
searchobj = pattern.search(target_str)
if searchobj :
print searchobj
print searchobj.group()
print searchobj.groups()
else:
print 'no match'
matchobj = pattern.match(target_str)
#print matchobj # None
if matchobj:
print matchobj
else:
print 'no match'
target_str2 = 'http://maps.google.cn/maps/api/staticmap?zoom=12&size=270x180&markers=icon:http://static.qyer.com/images/place5/icon_mapno_big.png|35.715038,139.796799&sensor=false'
pattern2 = re.compile(reg_str)
matchobj2 = pattern2.match(target_str2)
if matchobj2 :
print matchobj2
print matchobj2.group()
print matchobj2.groups()
else:
print 'no match'
注意:
红色代码段使用match匹配是不能成功地,需要使用search方法
这也就是match和search的不同之处,match是匹配全串,适合电话号码,邮箱验证等功能;而search适合在html文档中搜索需要用的字符串
findall
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
返回一个列表
当有分组时,返回匹配到的分组
当分组为多个时,没次匹配的多个分组为一个元祖(tuple),返回一个包含所有元祖的列表 list
finditer
Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.
返回MatchObject 类型迭代器,这个就是php中的 preg_match_all
例:
# -*- coding: utf-8 -*-
'''
正则表达式
'''
import re
target_str = '
<img src="http://maps.google.cn/maps/api/staticmap?zoom=12&size=270x180&markers=icon:http://static.qyer.com/images/place5/icon_mapno_big.png|35.715038,139.796799&sensor=false">
<img src="http://maps.google.cn/maps/api/staticmap?zoom=12&size=270x180&markers=icon:http://static.qyer.com/images/place5/icon_mapno_big.png|35.7138,13.796799&sensor=false">
'
reg_str = "http://maps.google.cn.*?|(.*?),(.*?)&sensor=false"
f_iter = re.finditer(reg_str, target_str, re.I)
print f_iter
for item in f_iter :
print item.group()
print item.groups()
pass
'''
<callable-iterator object at 0x0000000002184588>
http://maps.google.cn/maps/api/staticmap?zoom=12&size=270x180&markers=icon:http://static.qyer.com/images/place5/icon_mapno_big.png|35.715038,139.796799&sensor=false
('35.715038', '139.796799')
http://maps.google.cn/maps/api/staticmap?zoom=12&size=270x180&markers=icon:http://static.qyer.com/images/place5/icon_mapno_big.png|35.7138,13.796799&sensor=false
('35.7138', '13.796799')
'''
f_all = re.findall(reg_str, target_str, re.I)
print f_all
'''
[('35.715038', '139.796799'), ('35.7138', '13.796799')]
'''