pythonlibxml2

發布時間: 2023-05-24 10:04:16

① 編譯安裝python需要哪些依賴

依賴庫：

//使用apt 安裝即可
1.gcc, make, zlib1g-dev（壓縮／解壓縮庫）
安裝過程需要的庫。
2.libbz2-dev
bz2支持庫，若在編譯安裝python前沒有安裝，將無法通過pip install 安裝提供bz2格式的第三方庫，會出現unsupported archive format: .tar.bz2的錯誤，例如爬蟲庫Scrapy依賴的Twisted。
3.libsqlite3-dev
sqlite3支持庫，若在編譯安裝python前沒有安裝，則python中會缺失sqlite3模塊，當引入sqlite3或使用依賴sqllite3的第三方庫(例如Scrapy)時，會出現ImportError: No mol named _sqllite3的錯誤。
//以上為編譯安裝前需要安裝的庫，可能不夠全面，會不斷補充。
4.其他：安裝第三方庫需要的庫
python3-dev, libxml2-dev, libxslt1, libffi-dev, libssl-dev等，在安裝第三方庫會有具體說明，不做過多解釋。

安裝：

//通過wget獲取壓縮包，這里選擇3.6.1版
wget https://www.python.org/ftp/python/3.6.1/Python-3.6.1.tar.xz
//解壓
tar xJf Python-3.6.1.tar.xz
cd Python-3.6.1
./configure
make
/*這步如果需要sudo，請使用sudo -H命令，即sudo -H make install，避免pip等模塊安裝失敗。
錯誤示例(pip安裝失敗)：The directory '/home/ls/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
*/
make install

② python lxml etree怎麼甩

lxml是Python語言中處理XML和HTML功能最豐富，最易於使用的庫。

lxml是libxml2和libxslt兩個C庫的Python化綁定，它的獨特之處在於兼顧了這些庫的速度和功能完整性，同時還具有Python API的簡介。兼容ElementTree API,但是比它更優越。

用libxml2編程就像是一個異於常人的陌生人的令人驚恐的擁抱，它看上去可以滿足你一切瘋狂的夢想，但是你的內心深處一直在警告你，你有可能會以最糟糕的方式遭殃，所以就有了lxml。

這是一個用lxml.etree來處理XML的教程，它簡單的概述了ElementTree API的主要概念，同時有一些能讓你的程序生涯更輕松的簡單的提高。

首先是導入lxml.etree的方式:

fromlxmlimportetree

為了協助代碼的可移植性，本教程中的例子很明顯可以看出，一部分API是lxml.etree在ElementTree API（由Fredrik Lundh 的ElementTree庫定義）的基礎上的擴展。

Element是ElementTree API的主要容器類，大部分XML tree的功能都是通過這個類來實現的，Element的創建很容易：

root=etree.Element("root")

element的XML tag名通過tag屬性來訪問

>>>printroot.tag
root

許多Element被組織成一個XML樹狀結構，創建一個子element並添加進父element使用append方法：

>>>root.append(etree.Element("和耐child1"))

還有一個更簡短更有效的方法：the SubElement，它的參數和element一樣，但是需要父element作為第一個參數：

>>>child2=etree.SubElement(root,"child2")
>>>child3=etree.SubElement(root,"child3")

可以序列化你創建的樹：

>>>print(etree.tostring(root,pretty_print=True))
<root>
<child1/>
<child2/>
<child3/>
</root>

為了更方便直胡棚野觀的訪問這些子節點，element模仿了正常的Python鏈：

>>>child=root[0]>>>print(child.tag)
child1
>>>print(len(root))
>>>root.index(root[1])#lxml.etreeonly!
>>>children=list(root)>>>forchildinroot:...print(child.tag)child1child2
child3
>>>root.insert(0,etree.Element("child0"))>>>start褲喊=root[:1]>>>end=root[-1:]>>>print(start[0].tag)child0>>>print(end[0].tag)child3

還可以根據element的真值看其是否有孩子節點：

ifroot:#thisnolongerworks!
print("Therootelementhaschildren")

用len(element)更直觀，且不容易出錯：

>>>print(etree.iselement(root))#testifit'ssomekindofElement
True
>>>iflen(root):#testifithaschildren
...print("Therootelementhaschildren")
Therootelementhaschildren

還有一個重要的特性，原文的句子只可意會，看例子應該是能看懂什麼意思吧。

>>>forchildinroot:...print(child.tag)child0child1child2child3>>>root[0]=root[-1]#移動了element>>>forchildinroot:...print(child.tag)child3child1child2>>>l=[0,1,2,3]>>>l[0]=l[-1]>>>l[3,1,2,3]
>>>rootisroot[0].getparent()#lxml.etreeonly!.etree,'sstandardlibrary:>>>fromimportdeep>>>element=etree.Element("neu")>>>element.append(deep(root[1]))>>>print(element[0].tag)child1>>>print([c.tagforcinroot])['child3','child1','child2']

XML支持屬性，創建方式如下：

>>>root=etree.Element("root",interesting="totally")
>>>etree.tostring(root)
b'<rootinteresting="totally"/>'

屬性是無序的鍵值對，所以可以用element類似於字典介面的方式處理：

>>>print(root.get("interesting"))
totally
>>>print(root.get("hello"))
None
>>>root.set("hello","Huhu")
>>>print(root.get("hello"))
Huhu
>>>etree.tostring(root)
b'<rootinteresting="totally"hello="Huhu"/>'
>>>sorted(root.keys())
['hello','interesting']
>>>forname,valueinsorted(root.items()):
...print('%s=%r'%(name,value))
hello='Huhu'
interesting='totally'

如果需要獲得一個類似dict的對象，可以使用attrib屬性：

>>>attributes=root.attrib
>>>print(attributes["interesting"])
totally
>>>print(attributes.get("no-such-attribute"))
None
>>>attributes["hello"]="GutenTag"
>>>print(attributes["hello"])
GutenTag
>>>print(root.get("hello"))
GutenTag

既然attrib是element本身支持的類似dict的對象，這就意味著任何對element的改變都會影響attrib，反之亦然。這還意味著只要element的任何一個attrib還在使用，XML樹就一直在內存中。通過如下方法，可以獲得一個獨立於XML樹的attrib的快照：

>>>d=dict(root.attrib)
>>>sorted(d.items())
[('hello','GutenTag'),('interesting','totally')]

③ linux下面裝libxml2-python老是裝不上，怎麼回事

把linux下面的python從2.4更新到2.7了，然後用sudo yum install libxml2-python命令安裝libxml2每次都提示成功，但是進入到python環境輸入import libxml2都提示錯誤，後來發現用yum install 安裝默認安裝到了python2.4下面的site-packages下

④ python使用xpath（超詳細）

使用時先安裝 lxml 包

開始使用 #

和beautifulsoup類似，首先我們需要得到一個文檔樹

把文本轉換成一個文檔樹對象

from lxml import etreeif __name__ == '__main__':doc='''

把文件轉換成一個文檔樹對象

fromlxmlimportetree# 讀取外部文件 index.htmlhtml = etree.parse('./index.html')result = etree.tostring(html, pretty_print=True)#pretty_print=True 會格式化輸出print(result)

均會列印出文檔內容

節點、元素、屬性、內容 #

xpath 的思想是通過路徑表達去尋找節點。節點包括元素，屬性，和內容

元素舉例

html --->...div --->

這里我們可以看到，這里的元素和html中的標簽一個意思。單獨的元素是無法表達一個路徑的，所以單獨的元素不能獨立使用

路徑表達式 #

/ 根節點，節點分隔符，// 任意位置. 當前節點.. 父級節點@ 屬性

通配符 #

* 任意元素@* 任意屬性node() 任意子節點（元素，屬性，內容)

謂語 #

使用中括弧來限定元素，稱為謂語

//a[n] n為大於零的整數，代表子元素排在第n個位置的元素//a[last()] last() 代表子元素排在最後個位置的元素//a[last()-] 和上面同理，代表倒數第二個//a[position()<3] 位置序號小於3，也就是前兩個，這里我們可以看出xpath中的序列是從1開始//a[@href] 擁有href的元素//a[@href='www..com'] href屬性值為'www..com'的元素//book[@price>2] price值大於2的元素

多個路徑 #

用| 連接兩個表達式，可以進行或匹配

//book/title | //book/price

函數 #

xpath內置很多函數。更多函數查看 https://www.w3school.com.cn/xpath/xpath_functions.asp

contains(string1,string2)

starts-with(string1,string2)

ends-with(string1,string2) #不支持

upper-case(string) #不支持

text()

last()

position()

node()

可以看到last()也是個函數，在前面我們在謂語中已經提到過了

案例 #

定位元素 #

匹配多個元素，返回列表

fromlxmlimportetreeif__name__ =='__main__':doc='''

【結果為】

[<Element li at 0x2b41b749848>, <Element li at 0x2b41b749808>, <Element li at 0x2b41b749908>, <Element li at 0x2b41b749948>, <Element li at 0x2b41b749988>][] #沒找到p元素

html = etree.HTML(doc)print(etree.tostring(html.xpath("//li[@class='item-inactive']")[0]))print(html.xpath("//li[@class='item-inactive']")[0].text)print(html.xpath("//li[@class='item-inactive']/a")[0].text)print(html.xpath("//li[@class='item-inactive']/a/text()"))print(html.xpath("//li[@class='item-inactive']/.."))print(html.xpath("//li[@class='item-inactive']/../li[@class='item-0']"))

【結果為】

b' third item \n 'None #因為第三個li下面沒有直接text，Nonethird item #['third item'][<Element ul at 0x19cd8c4c848>][<Element li at 0x15ea3c5b848>, <Element li at 0x15ea3c5b6c8>]

使用函數 #

contains #

有的時候，class作為選擇條件的時候不合適@class='....' 這個是完全匹配，當王爺樣式發生變化時，class或許會增加或減少像active的class。用contains就能很方便

from lxml import etreeif __name__ == '__main__':doc='''

【結果為】

[<Element p at 0x23f4a9d12c8>, <Element li at 0x23f4a9d13c8>, <Element li at 0x23f4a9d1408>, <Element li at 0x23f4a9d1448>, <Element li at 0x23f4a9d1488>]

starts-with #

from lxml import etreeif __name__ == '__main__':doc='''

【結果為】

[<Element ul at 0x23384e51148>, <Element p at 0x23384e51248>, <Element li at 0x23384e51288>, <Element li at 0x23384e512c8>, <Element li at 0x23384e51308>, <Element li at 0x23384e51388>][<Element ul at 0x23384e51148>]

ends-with #

print(html.xpath("//*[ends-with(@class,'ul')]"))

【結果為】

Traceback (most recent call last):File"F:/OneDrive/pprojects/shoes-show-spider/test/xp5_test.py",line18,inprint(html.xpath("//*[ends-with(@class,'ul')]"))File"src\lxml\etree.pyx",line1582,inlxml.etree._Element.xpathFile"src\lxml\xpath.pxi",line305,inlxml.etree.XPathElementEvaluator.__call__File"src\lxml\xpath.pxi",line225,inlxml.etree._XPathEvaluatorBase._handle_resultlxml.etree.XPathEvalError: Unregisteredfunction

看來python的lxml並不支持有的xpath函數列表

upper-case #

和ends-with函數一樣，也不支持。同樣報錯lxml.etree.XPathEvalError: Unregistered function

print(html.xpath("//a[contains(upper-case(@class),'ITEM-INACTIVE')]"))

text、last #

#最後一個li被限定了print(html.xpath("//li[last()]/a/text()"))#會得到所有的`<a>`元素的內容，因為每個<a>標簽都是各自父元素的最後一個元素。#本來每個li就只有一個<a>子元素，所以都是最後一個print(html.xpath("//li/a[last()]/text()"))print(html.xpath("//li/a[contains(text(),'third')]"))

【結果為】

['fifth item']['second item', 'third item', 'fourth item', 'fifth item'][<Element a at 0x26ab7bd1308>]

position #

print(html.xpath("//li[position()=2]/a/text()"))#結果為['third item']

上面這個例子我們之前以及講解過了

* 這里有個疑問，就是position()函數能不能像text()那樣用呢

print(html.xpath("//li[last()]/a/position()"))#結果 lxml.etree.XPathEvalError: Unregisteredfunction

這里我們得到一個結論，函數不是隨意放在哪裡都能得到自己想要的結果

node #

返回所有子節點，不管這個子節點是什麼類型（熟悉，元素，內容）

print(html.xpath("//ul/li[@class='item-inactive']/node()"))print(html.xpath("//ul/node()"))

【結果為】

[]['\n ', , '\n ', , '\n ', , '\n ', , '\n ', , ' 閉合標簽\n ']

獲取內容 #

**剛剛已經提到過，可以使用.text和text()的方式來獲取元素的內容

from lxml import etreeif __name__ == '__main__':doc='''

【結果為】

['first item','second item','third item','fourth item','fifth item']first item18['\n ','\n ','\n ','\n ','\n ',' 閉合標簽\n ']

看到這里，我們觀察到text()和.text的區別。自己總結吧。不太好表達，就不表達了

獲取屬性 #

print(html.xpath("//a/@href"))print(html.xpath("//li/@class"))

【結果為】

['link1.html', 'link2.html', 'link3.html', 'link4.html', 'link5.html']['item-0active', 'item-1', 'item-inactive', 'item-1', 'item-0']

自定義函數 #

我們從使用函數的過程中得到結論，就是有的函數不支持，有的支持，那問題來了，到底那些方法支持呢。我們在lxml官網找到了答案。 https://lxml.de/xpathxslt.html 。lxml 支持XPath 1.0 ，想使用其他擴展，使用libxml2，和libxslt的標准兼容的方式。 XPath 1.0官方文檔以及其他版本的XPath文檔 https://www.w3.org/TR/xpath/

lxml supports XPath1.0, XSLT1.0andthe EXSLT extensions through libxml2andlibxsltina standards compliant way.

除此之外，lxml還提供了自定義函數的方式來擴展xpath的支持度 https://lxml.de/extensions.html

from lxml import etree#定義函數def ends_with(context,s1,s2):return s1[0].endswith(s2)if __name__ == '__main__':doc='''

【結果為】

[<Element li at 0x2816ed30548>, <Element li at 0x2816ed30508>]['first item', 'third item']

形參s1會傳入xpath中的第一個參數@class，但這里注意@class是個列表

形參s2會傳入xpath中的第二個參數'active'，'active'是個字元串

官網例子 https://lxml.de/extensions.html

defhello(context, a):return"Hello %s"% afromlxmlimportetreens = etree.FunctionNamespace(None)ns['hello'] = helloroot = etree.XML('<a><b>Haegar</b></a>')print(root.xpath("hello('Dr. Falken')"))# 結果為 Hello Dr. Falken

⑤ libxml2-2.9.1make 出現錯誤，求大神解決，錯誤如下

You need the development library libpython-dev:
sudo apt-get install libpython-dev

⑥ centos7怎麼安裝安裝libxml2，libxslt，lxml python

lxml 依賴 libxml2 和 libxslt 的開發版本
看看喚含系統上缺少哪一個版本就裝哪一和鍵笑個libxslt-devel
再次安亮扮裝 lxml

⑦ python如何使用libxml2

直接import libxml2就可以導入坦友隱libxml2庫了, 然後直接調用裡面的方法就行了讓廳.我也看過libxml2的文檔,給的定義全是基於C語言的, 但是python已經有它的綁告螞定庫了, 也就是說, 所以裡面的函數, 在python都可以調用.至於類的使用, 我還沒有研究到. 直接寫關於python調用libxml2的文章太少了...得自己摸索了. 查看原帖>>

⑧ python導入libxml2dom不成功，求助

最近一個python項目需要用到libxml2dom這個包，然後就pip安裝之，之後驗證安裝是否成功：

shandow@mac:~ > python

Python 2.7.5 (default, Mar 9 2014, 22:15:05)

[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin

Type "help", "right", "credits" or "license" for more information.

>>>攔謹笑 import libxml2dom

Traceback (most recent call last):

File "<stdin>", line 1, in <mole>

File "/Library/Python/2.7/site-packages/libxml2dom/__init__.py", line 24, in <mole>

from libxml2dom.macrolib import *

File "/Library/Python/2.7/site-packages/libxml2dom/macrolib/__init__.py", line 26, in <mole>

from libxml2dom.macrolib.macrolib import *

File "/Library/Python/2.7/site-packages/libxml2dom/macrolib/macrolib.py", line 30, in <mole>

from libxmlmods import libxml2mod

ImportError: No mole named libxmlmods

意思是缺乏libxmlmods庫，使用晌渣pip安裝之，提示找不到。。。。好吧，度娘問吧，找到一個類似libxml2dom的官方聲明似的網站：

libxml2dom

Current release: libxml2dom 0.5 (requiring the low-level libxml2 Python bindings, typically provided by the python-libxml2 or libxml2-python packages for various GNU/Linux distributions: Ubuntu, Debian, Fedora, Red Hat, SuSE)
Introction
The libxml2dom package provides a traditional DOM wrapper around the Python bindings for libxml2. In contrast to the standard libxml2 bindings, libxml2dom provides an API reminiscent of minidom, pxdom and other Python-based and Python-related XML toolkits. Performance is fairly respectable since libxml2dom makes direct use of libxml2mod - the low-level wrapping of libxml2 for Python. Moreover, serialisation of documents is much faster than many other toolkits because libxml2dom can make direct use of libxml2 rather than employing Python-level mechanisms to visit and serialise nodes.
Copyright and Licence
libxml2dom is licensed under the LGPL version 3 (or later).

這里第一句話就是libxml2dom依賴libxml2庫，可以通過查找python-libxml2或者libxml2-python下載，可是偶使用pip安裝都提示找不到資源。簡含。。。好吧，再次度娘，手動下載了這個包。這里貼出我分享出來的地址：libxml2下載（我的是mac
os系統，windows系統的自行下載吧，哈哈。。。）再說下mac os下如何安裝：
1.先把下載的tar包放到自己python環境的site-packages,我的是：
/Library/Python/2.7/site-packages
2.解壓：sudo tar -xvf libxml2-2.7.8.tar
3.進入解壓後的文件夾,由於是源碼包需要編譯安裝：
cd libxml2-2.7.8
sudo ./configure
sudo make
sudo make install
好了，如果沒有報錯就ok了，測試下把：
python
import libxml2
提示沒有這個模塊，說明沒有導成功，郁悶。。。。
重新進入文件夾：
cd libxml2-2.7.8
發現這個文件夾中有個python文件夾，進入：

shandow@mac:/Library/Python/2.7/site-packages/libxml2-2.7.8 > cd python/
shandow@mac:/Library/Python/2.7/site-packages/libxml2-2.7.8/python > ll
total 5704
-rw-r--r-- 1 root network 132 11 17 10:46 MANIFEST
-rw-r--r-- 1 root network 31357 11 17 10:43 Makefile
-rw-rw-r--@ 1 50138 network 1542 11 4 2010 Makefile.am
-rw-rw-r--@ 1 50138 network 32443 11 5 2010 Makefile.in
-rw-rw-r--@ 1 50138 network 1272 9 24 2009 README
-rw-rw-r--@ 1 50138 network 1623 9 24 2009 TODO
drwxr-xr-x 4 root network 136 11 17 10:46 build
-rw-rw-r--@ 1 50138 network 15061 9 24 2009 drv_libxml2.py
-rw-r--r-- 1 root network 0 11 17 10:44 gen_prog
-rwxrwxr-x@ 1 50138 network 47541 10 16 2010 generator.py
-rw-rw-r--@ 1 50138 network 104464 11 3 2010 libxml.c
-rw-r--r-- 1 root network 271 11 17 10:44 libxml.lo
-rw-r--r-- 1 root network 297240 11 17 10:44 libxml.o
-rw-rw-r--@ 1 50138 network 22817 10 12 2010 libxml.py
-rw-r--r-- 1 root network 126532 11 17 10:44 libxml2-export.c
-rw-r--r-- 1 root network 434813 11 17 10:44 libxml2-py.c
-rw-r--r-- 1 root network 112512 11 17 10:44 libxml2-py.h
-rw-r--r-- 1 root network 283 11 17 10:44 libxml2-py.lo
-rw-r--r-- 1 root network 819684 11 17 10:44 libxml2-py.o
-rw-rw-r--@ 1 50138 network 18669 10 12 2010 libxml2-python-api.xml
-rw-r--r-- 1 root network 341257 11 17 10:44 libxml2.py
-rw-r--r-- 1 root network 318440 11 17 10:44 libxml2class.py
-rw-r--r-- 1 root network 22768 11 17 10:44 libxml2class.txt
-rw-r--r-- 1 root network 1166 11 17 10:44 libxml2mod.la
-rw-rw-r--@ 1 50138 network 7277 10 12 2010 libxml_wrap.h
-rwxr-xr-x 1 root network 6685 11 17 10:43 setup.py
-rwxrwxr-x@ 1 50138 network 6707 9 24 2009 setup.py.in
drwxrwxr-x@ 55 50138 network 1870 11 17 10:43 tests
-rw-rw-r--@ 1 50138 network 21068 10 12 2010 types.c
-rw-r--r-- 1 root network 268 11 17 10:44 types.lo
-rw-r--r-- 1 root network 63320 11 17 10:44 types.o
shandow@mac:/Library/Python/2.7/site-packages/libxml2-2.7.8/python >
參照網上說的python第三方包的安裝方式：
sudo python setup.py build
sudo python setup.py install
這兩句運行完後，重新進入site-packages文件夾，發現多了libxml2的egg-info文件：

再次測試：

shandow@mac:/Library/Python/2.7/site-packages > python
Python 2.7.5 (default, Mar 9 2014, 22:15:05)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "right", "credits" or "license" for more information.
>>> import libxml2
>>> import libxml2dom
>>>
libxml2安裝成功了，libxml2dom也不會報錯了，大功告成。。。。。網上網路，google都沒有這個問題的解決辦法，作者這里原創一個，哈哈。。。

⑨ python lxml庫怎麼安裝

lxml是Python中與XML及HTML相關功能中最豐富和最容易使用的庫。lxml並不是Python自帶的包，而是為libxml2和libxslt庫的一個Python化的綁定。它與眾不同的地方是它兼顧了這些庫的速度和功能完整性，以及純Python API的簡潔性，與大家熟知的ElementTree API兼容但比之更優越！但安裝lxml卻又有點麻煩，因為存在依賴，直接安裝的話用easy_install, pip都不能成功，會報gcc錯誤。下面列出來Windows、Linux下面的安裝方法:
【Windows系統】
先確保Python已經安裝好，環境變數也配置好了，相應的的easy_install、pip也安裝好了.
1. 執行 pip install virtualenv
[python] view plain print?
C:\>pip install virtualenv
Requirement already satisfied (use --upgrade to upgrade): virtualenv in c:\python27\lib\site-package
s\virtualenv-12.0.4-py2.7.egg
2. 從官方網站下載與系統，Python版本匹配的lxml文件：
http //pypi.python.org/pypi/lxml/2.3/
NOTE:
比如說我的電腦是Python 2.7.4, 64位操作系統，那麼我就可以下載
[python] view plain print?
lxml-2.3-py2.7-win-amd64.egg (md5) # Python Egg
或
lxml-2.3.win-amd64-py2.7.exe (md5) # MS Windows installer
3. 執行 easy_install lxml-2.3-py2.7-win-amd64.egg
[python] view plain print?
D:\Downloads>easy_install lxml-2.3-py2.7-win-amd64.egg # 進入該文件所在目錄執行該命令
Processing lxml-2.3-py2.7-win-amd64.egg
creating c:\python27\lib\site-packages\lxml-2.3-py2.7-win-amd64.egg
Extracting lxml-2.3-py2.7-win-amd64.egg to c:\python27\lib\site-packages
Adding lxml 2.3 to easy-install.pth file
Installed c:\python27\lib\site-packages\lxml-2.3-py2.7-win-amd64.egg
Processing dependencies for lxml==2.3
Finished processing dependencies for lxml==2.3
NOTE:
1. 可用exe可執行文件，方法更簡單直接安裝就可以
2. 可用easy_install安裝方式，也可以用pip的方式
[python] view plain print?
#再執行下，就安裝成功了！
>>> import lxml
>>>
3. 如用pip安裝，常用命令就是:
pip install simplejson # 安裝Python包
pip install --upgrade simplejson # 升級Python包
pip uninstall simplejson # 卸載Python包
4. 如用Eclipse+Pydev的開發方式，需要移除舊包，重新載入一次
Window --> Preferences --> PyDev --> Interperter-python # 否則導包的時候會報錯
【Linux系統】
因為lxml依賴的包如下:
libxml2, libxml2-devel, libxlst, libxlst-devel, python-libxml2, python-libxslt
所以安裝步驟如下:
第一步: 安裝 libxml2
$ sudo apt-get install libxml2 libxml2-dev
第二步: 安裝 libxslt
$ sudo apt-get install libxlst libxslt-dev
第三步: 安裝 python-libxml2 和 python-libxslt
$ sudo apt-get install python-libxml2 python-libxslt
第四步: 安裝 lxml
$ sudo easy_install lxml

⑩ 如何用python讀取xml文件

一、簡介

XML（eXtensible Markup Language）指可擴展標記語言，被設計用來傳輸和存儲數據，已經日趨成為當前許多新生技術的核心，在不同的領域都有著不同的應用。它是web發展到一定階段的必然產物，既具有SGML的核心特徵，又有著HTML的簡單特性，還具有明確和結構良好等許多新的特性。
python解析XML常見的有三種方法：一是xml.dom.*模塊，它是W3C DOM API的實現，若需要處理DOM API則該模塊很適合，注意xml.dom包裡面有許多模塊，須區分它們間的不同；二是xml.sax.*模塊，它是SAX API的實現，這個模塊犧牲了便捷性來換取速度和內存佔用，SAX是一個基於事件的API，這就意味著它可以「在空中」處理龐大數量的的文檔，不用完全載入進內存；三是xml.etree.ElementTree模塊（簡稱 ET），它提供了輕量級的Python式的API，相對於DOM來說ET 快了很多，而且有很多令人愉悅的API可以使用，相對於SAX來說ET的ET.iterparse也提供了「在空中」的處理方式，沒有必要載入整個文檔到內存，ET的性能的平均值和SAX差不多，但是API的效率更高一點而且使用起來很方便。
二、詳解

解析的xml文件（country.xml）：
在CODE上查看代碼片派生到我的代碼片

<?xml version="1.0"?>
<data>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>

1、xml.etree.ElementTree

ElementTree生來就是為了處理XML，它在Python標准庫中有兩種實現：一種是純Python實現的，如xml.etree.ElementTree，另一種是速度快一點的xml.etree.cElementTree。注意：盡量使用C語言實現的那種，因為它速度更快，而且消耗的內存更少。
在CODE上查看代碼片派生到我的代碼片

try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET

這是一個讓Python不同的庫使用相同API的一個比較常用的辦法，而從Python 3.3開始ElementTree模塊會自動尋找可用的C庫來加快速度，所以只需要import xml.etree.ElementTree就可以了。
在CODE上查看代碼片派生到我的代碼片

#!/usr/bin/evn python
#coding:utf-8

try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
import sys

try:
tree = ET.parse("country.xml") #打開xml文檔
#root = ET.fromstring(country_string) #從字元串傳遞xml
root = tree.getroot() #獲得root節點
except Exception, e:
print "Error:cannot parse file:country.xml."
sys.exit(1)
print root.tag, "---", root.attrib
for child in root:
print child.tag, "---", child.attrib

print "*"*10
print root[0][1].text #通過下標訪問
print root[0].tag, root[0].text
print "*"*10

for country in root.findall('country'): #找到root節點下的所有country節點
rank = country.find('rank').text #子節點下節點rank的值
name = country.get('name') #子節點下屬性name的值
print name, rank

#修改xml文件
for country in root.findall('country'):
rank = int(country.find('rank').text)
if rank > 50:
root.remove(country)

tree.write('output.xml')

運行結果：

三、總結
（1）Python中XML解析可用的類庫或模塊有xml、libxml2 、lxml 、xpath等，需要深入了解的還需參考相應的文檔。
（2）每一種解析方式都有自己的優點和缺點，選擇前可以綜合各個方面的性能考慮。
（3）若有不足，請留言，在此先感謝！

閱讀全文

熱點內容

隨機啟動腳本發布：2025-07-05 16:10:30 瀏覽：528

微博資料庫設計發布：2025-07-05 15:30:55 瀏覽：25

linux485 發布：2025-07-05 14:38:28 瀏覽：305

php用的軟體發布：2025-07-05 14:06:22 瀏覽：756

沒有許可權訪問計算機發布：2025-07-05 13:29:11 瀏覽：433

javaweb開發教程視頻教程發布：2025-07-05 13:24:41 瀏覽：708

康師傅控流腳本破解發布：2025-07-05 13:17:27 瀏覽：243

java的開發流程發布：2025-07-05 12:45:11 瀏覽：688

怎麼看內存卡配置發布：2025-07-05 12:29:19 瀏覽：285

訪問學者英文個人簡歷發布：2025-07-05 12:29:17 瀏覽：835

pythonlibxml2

與pythonlibxml2相關的資訊