I am looking to scrape a data point using Python off of the url http://www.cavirtex.com/orderbook .
The data point I am looking to scrape is the lowest bid offer, which at the current moment looks like this:
<tr>
<td><b>Jan. 19, 2014, 2:37 a.m.</b></td>
<td><b>0.0775/0.1146</b></td>
<td><b>860.00000</b></td>
<td><b>66.65 CAD</b></td>
</tr>
The relevant point being the 860.00 . I am looking to build this into a script which can send me an email to alert me of certain price differentials compared to other exchanges.
I'm quite noobie so if in your explanations you could offer your thought process on why you've done certain things it would be very much appreciated.
Thank you in advance!
Edit: This is what I have so far which will return me the name of the title correctly, I'm having trouble grabbing the table data though.
import urllib2, sys
from bs4 import BeautifulSoup
site= "http://cavirtex.com/orderbook"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
print soup.title
Here is the code for scraping the lowest bid from the 'Buying BTC' table:
from selenium import webdriver
fp = webdriver.FirefoxProfile()
browser = webdriver.Firefox(firefox_profile=fp)
browser.get('http://www.cavirtex.com/orderbook')
lowest_bid = float('inf')
elements = browser.find_elements_by_xpath('//div[@id="orderbook_buy"]/table/tbody/tr/td')
for element in elements:
text = element.get_attribute('innerHTML').strip('<b>|</b>')
try:
bid = float(text)
if lowest_bid > bid:
lowest_bid = bid
except:
pass
browser.quit()
print lowest_bid
In order to install Selenium for Python on your Windows-PC, run from a command line:
pip install selenium (or pip install selenium --upgrade if you already have it).
If you want the 'Selling BTC' table instead, then change "orderbook_buy" to "orderbook_sell".
If you want the 'Last Trades' table instead, then change "orderbook_buy" to "orderbook_trades".
Note:
If you consider performance critical, then you can implement the data-scraping via URL-Connection instead of Selenium, and have your program running much faster. However, your code will probably end up being a lot "messier", due to the tedious XML parsing that you'll be obliged to apply...
Here is the code for sending the previous output in an email from yourself to yourself:
import smtplib,ssl
def SendMail(username,password,contents):
server = Connect(username)
try:
server.login(username,password)
server.sendmail(username,username,contents)
except smtplib.SMTPException,error:
Print(error)
Disconnect(server)
def Connect(username):
serverName = username[username.index("@")+1:username.index(".")]
while True:
try:
server = smtplib.SMTP(serverDict[serverName])
except smtplib.SMTPException,error:
Print(error)
continue
try:
server.ehlo()
if server.has_extn("starttls"):
server.starttls()
server.ehlo()
except (smtplib.SMTPException,ssl.SSLError),error:
Print(error)
Disconnect(server)
continue
break
return server
def Disconnect(server):
try:
server.quit()
except smtplib.SMTPException,error:
Print(error)
serverDict = {
"gmail" :"smtp.gmail.com",
"hotmail":"smtp.live.com",
"yahoo" :"smtp.mail.yahoo.com"
}
SendMail("your_username@your_provider.com","your_password",str(lowest_bid))
The above code should work if your email provider is either gmail or hotmail or yahoo.
Please note that depending on your firewall configuration, it may ask your permission upon the first time you try it...
Source: http://stackoverflow.com/questions/21217034/scrape-data-point-using-python