最近需要用到Python去解析Digu微博的API返回的XML文档,将下列操作记录下来。
下面的只是部分代码,实际程序中还包含了异常处理等步骤,在这里统统省略,只保存了最基本的xml解析部分。
希望对大家有点儿帮助。
XML文件如下:
Fri Oct 22 10:02:35 +0800 2010 47105283 测试 http://img0.digu.com/100x75/u/1287712955201_053b159e91c7860c76fbdf83aaae408f.jpg http://img1.digu.com/100x75/u/1287712955316_f50231858802876daf547866a5ca4137.jpg http://img1.digu.com/100x75/u/1287712955427_87b5d9d02414b1da79536b57d65ba03e.jpg GO浏览器 false
解析该xml文件的部分代码如下:
from xml.dom.minidom import parse, parseString def getText(nodelist): rc = "" for node in nodelist: if node.nodeType == node.TEXT_NODE: rc = rc + node.data return rc def parseSingleListToDict(): diguList = [] dom = parse('myfile.xml') statuses_element = dom.getElementsByTagName('statuses')[0] status_element = statuses_element.getElementsByTagName('status') for singleStatus in status_element: # 记录ID id_element = singleStatus.getElementsByTagName('id')[0] diguId = getText(id_element.childNodes) # 内容 text_element = singleStatus.getElementsByTagName('text')[0] diguText = getText(text_element.childNodes) # 发表时间 pubdate_element = singleStatus.getElementsByTagName('created_at')[0] diguPubdate = getText(pubdate_element.childNodes) # 图片列表 pic_element = singleStatus.getElementsByTagName('picPath') picList = [] for singlePic in pic_element: picList.append(getText(singlePic.childNodes)) diguList[diguId] = {'id':diguId, 'text':diguText, 'pubdate':diguPubdate, 'pic':picList,} return diguList