最近需要用到Python去解析Digu微博的API返回的XML文档,将下列操作记录下来。

下面的只是部分代码,实际程序中还包含了异常处理等步骤,在这里统统省略,只保存了最基本的xml解析部分。

希望对大家有点儿帮助。

XML文件如下:

	
		Fri Oct 22 10:02:35 +0800 2010
		47105283
		测试
		http://img0.digu.com/100x75/u/1287712955201_053b159e91c7860c76fbdf83aaae408f.jpg
		http://img1.digu.com/100x75/u/1287712955316_f50231858802876daf547866a5ca4137.jpg
		http://img1.digu.com/100x75/u/1287712955427_87b5d9d02414b1da79536b57d65ba03e.jpg
		GO浏览器
		
		
		false
		
		
	

解析该xml文件的部分代码如下:

from xml.dom.minidom import parse, parseString
def getText(nodelist):
	rc = ""
	for node in nodelist:
		if node.nodeType == node.TEXT_NODE:
			rc = rc + node.data
	return rc
def parseSingleListToDict():
	diguList = []
	dom = parse('myfile.xml')
	statuses_element = dom.getElementsByTagName('statuses')[0]
	status_element = statuses_element.getElementsByTagName('status')
	for singleStatus in status_element:
		# 记录ID
		id_element = singleStatus.getElementsByTagName('id')[0]
		diguId = getText(id_element.childNodes)
		# 内容
		text_element = singleStatus.getElementsByTagName('text')[0]
		diguText = getText(text_element.childNodes)
		# 发表时间
		pubdate_element = singleStatus.getElementsByTagName('created_at')[0]
		diguPubdate = getText(pubdate_element.childNodes)
		# 图片列表
		pic_element = singleStatus.getElementsByTagName('picPath')
		picList = []
		for singlePic in pic_element:
			picList.append(getText(singlePic.childNodes))
		diguList[diguId] = {'id':diguId, 'text':diguText, 'pubdate':diguPubdate, 'pic':picList,}
	return diguList
标签:, , , ,