#1 30.08.09 23:01
Python3, html parsing
Всем привет,
Как по тегу "tag_name":
<tag_name> some data </tag_name>
Получить "some data"???
Для примера, возьмем страницу: http://tycho.usno.navy.mil/cgi-bin/timer.pl
Саму страницу получил следующим образом:
import httplib2
h = httplib2.Http(".cache")
response, content = h.request('http://tycho.usno.navy.mil/cgi-bin/timer.pl')
html_file = content.decode("utf-8")
print(html_file)
пример страницы:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final"//EN>
<html>
<body>
<TITLE>What time is it?</TITLE>
<H2> US Naval Observatory Master Clock Time</H2> <H3><PRE>
<BR>Aug. 30, 15:48:31 UTC Universal Time
<BR>Aug. 30, 11:48:31 AM EDT Eastern Time
<BR>Aug. 30, 10:48:31 AM CDT Central Time
<BR>Aug. 30, 09:48:31 AM MDT Mountain Time
<BR>Aug. 30, 08:48:31 AM PDT Pacific Time
<BR>Aug. 30, 07:48:31 AM AKDT Alaska Time
<BR>Aug. 30, 05:48:31 AM HAST Hawaii-Aleutian Time
</PRE></H3><P><A HREF="http://www.usno.navy.mil"> US Naval Observatory</A>
</body></html>
Если не сложно, приведите плиииз пример для "<TITLE>"
Offline
#2 31.08.09 12:38
Re: Python3, html parsing
Код: python:
from xml.dom import minidom def walk(nodelist): data = '' for e in nodelist: if e.nodeType == e.TEXT_NODE: data += e.data return data if __name__ == '__main__': xmldoc = minidom.parse('filename') node = xmldoc.getElementsByTagName("TITLE")[0] print walk(node.childNodes)
Offline
#3 31.08.09 19:52
Re: Python3, html parsing
LLlypka, сильно. посоветуй книжку по python + web+dom :)
Offline
#4 01.09.09 23:51
Re: Python3, html parsing
Offline

