Steins;Lab

某团的自留研究所

[学习笔记]利用PHP探针和Python爬虫监控服务器状态

用Python爬虫抓取雅黑PHP探针返回的数据,以监控服务器,实时获取远端服务器的负载、CPU、内存、网卡流量、实时网速等信息。关键词:PHP探针、Python、爬虫、服务器监控。

 

 

0.实验环境


本实验在我的RaspberryPi 3b 上完成, Python3 。

我这个思路也算是骨骼清奇了,套了一圈用探针抓服务器信息。

不过对于一个本身运行着PHP的建站服务器,多挂一个探针并没有什么损耗。

某天我在床上躺尸的时候突然想到,大家天天剁手买VPS以针会友。如果我直接用树莓派抓取探针返回的服务器状态,显示在1602上,从此开始日日夜夜躺在沙发上看着数据跳动,岂不美哉?本文就是抓回数据的笔记。

 

1.雅黑PHP探针


1.1 关于PHP探针


关于PHP探针,给不知道的读者说道说道。

雅黑实验室  –  http://www.yahei.net/

【雅黑PHP探针】
雅黑PHP探针最大的优点:每秒更新,不用刷网页。有一个负责的站长,会对探针进行长期支持和更新。
用于Linux系统(不推荐使用于Windows系统)。
可以实时查看服务器硬盘资源、内存占用、网卡流量、系统负载、服务器时间等信息,1秒钟刷新一次。
以及包括服务器IP地址,Web服务器环境监测,php等信息。

php探针对于经常购买VPS折腾的人肯定不陌生,简单地老说就是一个可以获取系统信息并在网页上显示的php程序。雅黑PHP探针的界面如下:

我一个Digitalocean服务器上挂的演示探针:  http://sfo01.misaka.cc:888/tz.php

因此,经常有人买各种廉价小内存的VPS,只能挂个探针,却因此获得巨大快感,并从bbs上交流。叫做以针会友。

1.2 分析


打开探针网页可以看到循环刷新的服务器信息。思路很简单,用简单的Python爬虫去爬这个网页。

首先打开探针网页分析一下。

演示:

http://sfo01.misaka.cc:888/tz.php

可以看到服务器实时数据表格,是动态刷新的。因此,直接爬取该网页的html并不能持续获取服务器信息。既然有动态刷新,想必服务器和客户端之间必有数据包传输。在Chrome中,按F12开始审查网页,进入Networking标签栏。

可以立刻找到动态刷新请求的url。该url是

http://sfo01.misaka.cc:888/tz.php?act=rt&callback=jQuery1705809678890101435_1487402170358&_=1487402269387

直接访问该url,返回的是以下数据。

jQuery1705809678890101435_1487402170358({"useSpace":"3.985","freeSpace":"15.577","hdPercent":"20.37","barhdPercent":"20.37%","TotalMemory":"490.23 M","UsedMemory":"414.12 M","FreeMemory":"76.11 M","CachedMemory":"84.05 M","Buffers":"104.9 M","TotalSwap":"0 M","swapUsed":"0 M","swapFree":"0 M","loadAvg":"0.00 0.00 0.00 1\/117","uptime":"3\u59293\u5c0f\u65f626\u5206\u949f","freetime":"","bjtime":"","stime":"2017-02-18 15:17:56","memRealPercent":"45.93","memRealUsed":"225.17 M","memRealFree":"265.06 M","memPercent":"84.47%","memCachedPercent":"17.15","barmemCachedPercent":"17.15%","swapPercent":"0","barmemRealPercent":"45.93%","barswapPercent":"0%","NetOut2":"44 K 505 B ","NetOut3":"2 G 825 M 602 K 970 B ","NetOut4":"","NetOut5":"","NetOut6":"","NetOut7":"","NetOut8":"","NetOut9":"","NetOut10":"","NetInput2":"44 K 505 B ","NetInput3":"3 G 145 M 314 K 713 B ","NetInput4":"","NetInput5":"","NetInput6":"","NetInput7":"","NetInput8":"","NetInput9":"","NetInput10":"","NetOutSpeed2":"45561","NetOutSpeed3":"3013176266","NetOutSpeed4":"0","NetOutSpeed5":"","NetInputSpeed2":"45561","NetInputSpeed3":"3373591241","NetInputSpeed4":"0","NetInputSpeed5":""})

可以确认,返回的即为包含服务器实时信息的数据。

有没有感觉,在其后的数据有着一种很规范的标记方法?是的,在中括号之间,是一种json数据集。

JSON是一种取代XML的数据结构,和xml相比,它更小巧但描述能力却不差,由于它的小巧所以网络传输数据将减少更多流量从而加快速度。

可以将其理解成一组各类语言都可以接受的,有自己的标准的,用于互相交换信息的数据。

url中的参数 act=rt ,在雅黑PHP探针源码中ctrl+F一下其源代码。

 

立刻找到,在tz.php中第964行:

//ajax调用实时刷新

if ($_GET['act'] == "rt")

{

	$arr=array('useSpace'=>"$du",'freeSpace'=>"$df",'hdPercent'=>"$hdPercent",'barhdPercent'=>"$hdPercent%",'TotalMemory'=>"$mt",'UsedMemory'=>"$mu",'FreeMemory'=>"$mf",'CachedMemory'=>"$mc",'Buffers'=>"$mb",'TotalSwap'=>"$st",'swapUsed'=>"$su",'swapFree'=>"$sf",'loadAvg'=>"$load",'uptime'=>"$uptime",'freetime'=>"$freetime",'bjtime'=>"$bjtime",'stime'=>"$stime",'memRealPercent'=>"$memRealPercent",'memRealUsed'=>"$memRealUsed",'memRealFree'=>"$memRealFree",'memPercent'=>"$memPercent%",'memCachedPercent'=>"$memCachedPercent",'barmemCachedPercent'=>"$memCachedPercent%",'swapPercent'=>"$swapPercent",'barmemRealPercent'=>"$memRealPercent%",'barswapPercent'=>"$swapPercent%",'NetOut2'=>"$NetOut[2]",'NetOut3'=>"$NetOut[3]",'NetOut4'=>"$NetOut[4]",'NetOut5'=>"$NetOut[5]",'NetOut6'=>"$NetOut[6]",'NetOut7'=>"$NetOut[7]",'NetOut8'=>"$NetOut[8]",'NetOut9'=>"$NetOut[9]",'NetOut10'=>"$NetOut[10]",'NetInput2'=>"$NetInput[2]",'NetInput3'=>"$NetInput[3]",'NetInput4'=>"$NetInput[4]",'NetInput5'=>"$NetInput[5]",'NetInput6'=>"$NetInput[6]",'NetInput7'=>"$NetInput[7]",'NetInput8'=>"$NetInput[8]",'NetInput9'=>"$NetInput[9]",'NetInput10'=>"$NetInput[10]",'NetOutSpeed2'=>"$NetOutSpeed[2]",'NetOutSpeed3'=>"$NetOutSpeed[3]",'NetOutSpeed4'=>"$NetOutSpeed[4]",'NetOutSpeed5'=>"$NetOutSpeed[5]",'NetInputSpeed2'=>"$NetInputSpeed[2]",'NetInputSpeed3'=>"$NetInputSpeed[3]",'NetInputSpeed4'=>"$NetInputSpeed[4]",'NetInputSpeed5'=>"$NetInputSpeed[5]");

	$jarr=json_encode($arr); 
	$_GET['callback'] = htmlspecialchars($_GET['callback']);

	echo $_GET['callback'],'(',$jarr,')';

	exit;

}

即使不懂PHP,也可以看出它的规则。在我们的url中,callback参数为“jQuery1705809678890101435_1487402170358&_=1487402269387”。尝试直接请求 http://sfo01.misaka.cc:888/tz.php?act=rt

得到如下结果:

({"useSpace":"3.986","freeSpace":"15.576","hdPercent":"20.38","barhdPercent":"20.38%","TotalMemory":"490.23 M","UsedMemory":"414.94 M","FreeMemory":"75.29 M","CachedMemory":"84.82 M","Buffers":"105.35 M","TotalSwap":"0 M","swapUsed":"0 M","swapFree":"0 M","loadAvg":"0.05 0.01 0.00 1\/117","uptime":"3\u59293\u5c0f\u65f644\u5206\u949f","freetime":"","bjtime":"","stime":"2017-02-18 15:35:36","memRealPercent":"45.85","memRealUsed":"224.77 M","memRealFree":"265.46 M","memPercent":"84.64%","memCachedPercent":"17.3","barmemCachedPercent":"17.3%","swapPercent":"0","barmemRealPercent":"45.85%","barswapPercent":"0%","NetOut2":"44 K 505 B ","NetOut3":"2 G 826 M 560 K 68 B ","NetOut4":"","NetOut5":"","NetOut6":"","NetOut7":"","NetOut8":"","NetOut9":"","NetOut10":"","NetInput2":"44 K 505 B ","NetInput3":"3 G 146 M 334 K 784 B ","NetInput4":"","NetInput5":"","NetInput6":"","NetInput7":"","NetInput8":"","NetInput9":"","NetInput10":"","NetOutSpeed2":"45561","NetOutSpeed3":"3014180932","NetOutSpeed4":"0","NetOutSpeed5":"","NetInputSpeed2":"45561","NetInputSpeed3":"3374660368","NetInputSpeed4":"0","NetInputSpeed5":""})

爬虫的思路也清晰了。

 

2.Python的简单爬虫


Python爬虫的简易教程我参考了:

Python爬虫教程 – 崔庆才的个人博客

文章简洁精悍。没多少字,简单带过后,了解了爬虫运用的一些思想。

 

 

想获得服务器信息的json数据,比较容易。现在shell中验证一下

pi@raspberrypi:~ $ python3
Python 3.4.2 (default, Oct 19 2014, 13:31:11) 
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib import request
>>> f = request.urlopen("http://138.197.193.89:888/tz.php?act=rt")
>>> print(f)
<http.client.HTTPResponse object at 0x7656ef70>
>>> data = f.read()
>>> print(data.decode('utf-8'))

这里试着直接用我示例站的ip地址,避免等待dns解析。用utf-8格式解码后,得到如下结果:

({"useSpace":"3.983","freeSpace":"15.579","hdPercent":"20.36","barhdPercent":"20.36%","T 0.00 0.00 1\/119","uptime":"3\u59290\u5c0f\u65f616\u5206\u949f","freetime":"","bjtime":.49%","swapPercent":"0","barmemRealPercent":"51.14%","barswapPercent":"0%","NetOut2":"4437 M 409 K 258 B ","NetInput4":"","NetInput5":"","NetInput6":"","NetInput7":"","NetInputetInputSpeed4":"0","NetInputSpeed5":""})

打印出来的字符串并不是标准的json数据,字符串左右多了小括号。使用Python方便的字符串处理功能,将其去掉。但此时data并不是str属性,直接尝试去掉小括号会报错。

此时data的类型为“bytes”。用str()转换:

>>> type(data)
<class 'bytes'>
>>> data2 = str(data.decode('utf-8')).strip('(').strip(')')
>>> print(data2)
{"useSpace":"3.983","freeSpace":"15.579","hdPercent":"20.36","barhdPercent":"20.36%","TotalMemory":"490.23 M","UsedMemory":"427.5 M","FreeMemory":"62.73 M","CachedMemory":"76.5 M","Buffers":"98.95 M","TotalSwap":"0 M","swapUsed":"0 M","swapFree":"0 M","loadAvg":"0.00 0.00 0.00 1\/121","uptime":"3\u59290\u5c0f\u65f624\u5206\u949f","freetime":"","bjtime":"","stime":"2017-02-18 12:15:45","memRealPercent":"51.41","memRealUsed":"252.05 M","memRealFree":"238.18 M","memPercent":"87.2%","memCachedPercent":"15.6","barmemCachedPercent":"15.6%","swapPercent":"0","barmemRealPercent":"51.41%","barswapPercent":"0%","NetOut2":"44 K 505 B ","NetOut3":"2 G 817 M 1005 K 862 B ","NetOut4":"","NetOut5":"","NetOut6":"","NetOut7":"","NetOut8":"","NetOut9":"","NetOut10":"","NetInput2":"44 K 505 B ","NetInput3":"3 G 137 M 942 K 336 B ","NetInput4":"","NetInput5":"","NetInput6":"","NetInput7":"","NetInput8":"","NetInput9":"","NetInput10":"","NetOutSpeed2":"45561","NetOutSpeed3":"3005200222","NetOutSpeed4":"0","NetOutSpeed5":"","NetInputSpeed2":"45561","NetInputSpeed3":"3365845328","NetInputSpeed4":"0","NetInputSpeed5":""}

此时data2可直接通过json读取为字典

>>> import json
>>> json.loads(data2)
{'hdPercent': '20.36', 'NetInput9': '', 'swapFree': '0 M', 'NetOutSpeed5': '', 'UsedMemory': '427.5 M', 'NetOut10': '', 'NetInput4': '', 'NetInputSpeed5': '', 'FreeMemory': '62.73 M', 'NetInputSpeed4': '0', 'NetOut7': '', 'TotalSwap': '0 M', 'NetOut2': '44 K 505 B ', 'NetOut5': '', 'NetOut8': '', 'NetInput5': '', 'NetOut4': '', 'NetInputSpeed2': '45561', 'memCachedPercent': '15.6', 'NetInputSpeed3': '3365845328', 'loadAvg': '0.00 0.00 0.00 1/121', 'TotalMemory': '490.23 M', 'barmemRealPercent': '51.41%', 'NetOut6': '', 'NetInput7': '', 'barswapPercent': '0%', 'NetOutSpeed2': '45561', 'barhdPercent': '20.36%', 'stime': '2017-02-18 12:15:45', 'useSpace': '3.983', 'bjtime': '', 'barmemCachedPercent': '15.6%', 'memRealFree': '238.18 M', 'NetInput3': '3 G 137 M 942 K 336 B ', 'NetInput6': '', 'uptime': '3天0小时24分钟', 'NetOutSpeed4': '0', 'NetInput2': '44 K 505 B ', 'freetime': '', 'NetOut3': '2 G 817 M 1005 K 862 B ', 'NetInput10': '', 'memRealUsed': '252.05 M', 'Buffers': '98.95 M', 'freeSpace': '15.579', 'memPercent': '87.2%', 'NetOutSpeed3': '3005200222', 'swapUsed': '0 M', 'CachedMemory': '76.5 M', 'NetOut9': '', 'swapPercent': '0', 'memRealPercent': '51.41', 'NetInput8': ''}
>>> data3=json.loads(data2)
>>> type(data3)
<class 'dict'>
>>> type(data3['CachedMemory'])
<class 'str'>

完成。接下来只需要按照面向对象的思想、增加代码的健壮性将其封装起来即可。

 

3.封装

# -*- coding:utf-8 -*-
from urllib import request
import json

#探针爬虫类
class PHPTZ:

	#初始化方法,定义一些变量
	def __init__(self):
		self.url = 'http://138.197.193.89:888/tz.php?act=rt'
		
	def getData(self):
		try:
			f = request.urlopen(self.url)
			data = f.read()
			data2 = str(data.decode('utf-8')).strip('(').strip(')')
			dataj = json.loads(data2)
			print(dataj)
			print(type(dataj))
			
		except
			print('Error')
			return None
				
myserver = PHPTZ()
myserver.getData()

运行一下:

pi@raspberrypi:~ $ sudo python3 tz.py
{'NetInput7': '', 'NetInput5': '', 'NetOut2': '44 K 505 B ', 'uptime': '3天4小时48分钟', 'loadAvg': '0.00 0.00 0.00 1/115', 'NetInput10': '', 'stime': '2017-02-18 16:39:49', 'NetInput4': '', 'NetOutSpeed2': '45561', 'NetInputSpeed3': '3379146879', 'freetime': '', 'NetOut9': '', 'UsedMemory': '418.66 M', 'hdPercent': '20.39', 'swapFree': '0 M', 'NetOut7': '', 'CachedMemory': '87.81 M', 'NetInput3': '3 G 150 M 620 K 127 B ', 'NetOut3': '2 G 830 M 296 K 887 B ', 'NetInputSpeed4': '0', 'NetOut6': '', 'NetInput2': '44 K 505 B ', 'memRealPercent': '45.61', 'FreeMemory': '71.57 M', 'NetInput8': '', 'NetOut8': '', 'memRealFree': '266.66 M', 'freeSpace': '15.573', 'swapPercent': '0', 'barmemRealPercent': '45.61%', 'memCachedPercent': '17.91', 'TotalMemory': '490.23 M', 'NetInputSpeed2': '45561', 'barmemCachedPercent': '17.91%', 'NetInputSpeed5': '', 'TotalSwap': '0 M', 'NetOut4': '', 'barhdPercent': '20.39%', 'Buffers': '107.28 M', 'useSpace': '3.989', 'memPercent': '85.4%', 'bjtime': '', 'NetOutSpeed4': '0', 'NetInput6': '', 'memRealUsed': '223.57 M', 'barswapPercent': '0%', 'swapUsed': '0 M', 'NetOut5': '', 'NetInput9': '', 'NetOutSpeed5': '', 'NetOutSpeed3': '3018105719', 'NetOut10': ''}
<class 'dict'>

 

关于其错误处理的思想只是稍微领略了一下,不精,错误处理先试着这样写。

 

4.应用


有了数据在手,想怎么处理还不易如反掌?

尤其像是RaspberryPi这种东西,会有无尽的可能。我即将尝试制作新的东西

 

0 0 vote
Article Rating
Subscribe
提醒
guest
5 评论
最新
最旧 得票最多
Inline Feedbacks
View all comments
Clarke
3 年 之前

雅黑PHP探针好是好,可惜已经4年没更新了,最后一个版本0.4.7都不支持PHP7。

trackback
3 年 之前

[…] [学习笔记]利用PHP探针和Python爬虫监控服务器状态–https://steinslab.xyz/archives/1144 […]

CNO
CNO
3 年 之前

续跟着楼上:续

fandy
fandy
3 年 之前