8월 2014 ~ M@D+R

2014년 8월 29일 금요일

about google API

Posted on 8/29/2014 04:57:00 오후 by Unknown with No comments

* google drive API

open a service for development
https://console.developers.google.com/project

document for python
https://developers.google.com/drive/web/quickstart/quickstart-python

-- sample copy a document to gdrive
#!/usr/bin/python
#Copy a document to google drive.

import httplib2
import pprint

from apiclient.discovery import build
from apiclient.http import MediaFileUpload
from oauth2client.client import OAuth2WebServerFlow

from oauth2client.file import Storage
from oauth2client.util import logger

# Copy your credentials from the console
CLIENT_ID = '955147XXX1-o0jt3XXXXXXXXXXXXXXXXXXpfsnfe.apps.googleusercontent.com'
CLIENT_SECRET = 'UgXXXXXXXXXXXEoU7hG'

# Check https://developers.google.com/drive/scopes for all available scopes
OAUTH_SCOPE = 'https://www.googleapis.com/auth/drive'

# Redirect URI for installed apps
REDIRECT_URI = 'urn:ietf:wg:oauth:2.0:oob'

# Path to the file to upload
FILENAME = 'document.txt'

code = ''

# Run through the OAuth flow and retrieve credentials
flow = OAuth2WebServerFlow(CLIENT_ID, CLIENT_SECRET, OAUTH_SCOPE, REDIRECT_URI)

# if OAuth file are existed ?
storage = Storage('OAuthCredentials.txt')
credentials = storage.get()

if credentials is None:
# Authorization Step 1
authorize_url = flow.step1_get_authorize_url()
print 'Go to the following link in your browser: ' + authorize_url
code = raw_input('Enter verification code: ').strip()
# Authorization Step 2
credentials = flow.step2_exchange(code)
else:
code = credentials

# store to local file
storage.put(credentials)

# Create an httplib2.Http object and authorize it with our credentials
http = httplib2.Http()
http = credentials.authorize(http)

drive_service = build('drive', 'v2', http=http)

# Insert a file
media_body = MediaFileUpload(FILENAME, mimetype='text/plain', resumable=True)
body = {
'title': 'InsertTest1',
'description': 'A test document',
'mimeType': 'text/plain'
}

file = drive_service.files().insert(body=body, media_body=media_body).execute()
pprint.pprint(file)

Image analysis with Python

Posted on 8/20/2014 09:11:00 오후 by Unknown with No comments

1. matplotlib ( is a python 2D plotting library )

- site (api)

http://matplotlib.org/api/pyplot_api.html

2. opencv (is an open source computer vision and machine learning software library.)

- site (api)

http://opencv.org/

- template matching

http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html#which-are-the-matching-methods-available-in-opencv

3. NumPy (is the fundamental package for scientific computing with Python.)
http://www.numpy.org/

4. sample code

# personal issue

import matplotlib

matplotlib.use('TkAgg')

# begin

import cv2

import numpy as np

from matplotlib import pyplot as plt

pic = '/home/fox/park.JPG'

compare_pic = '/home/fox/car.JPG'

img_rgb = cv2.imread(pic)

img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)

template = cv2.imread(compare_pic,0)

w, h = template.shape[::-1]

res = cv2.matchTemplate(img_gray,template,cv2.TM_CCOEFF_NORMED)

threshold = 0.7 # wanna find a best matching image => np.amax(res)

loc = np.where( res >= threshold)

for pt in zip(*loc[::-1]):

cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0,0,255), 2)

cv2.imwrite('/home/fox/res3.png',img_rgb)

plt.subplot(121),plt.imshow(res,cmap = 'gray')

plt.title('Matching Result'), plt.xticks([]), plt.yticks([])

plt.subplot(122),plt.imshow(img_rgb,cmap = 'gray')

plt.title('Detected Point'), plt.xticks([]), plt.yticks([])

plt.show()

up to bottom
- car.JPG , park.JPG, best_one.JPG, density_0.6.JPG

somethings about linux

Posted on 8/18/2014 08:22:00 오후 by Unknown with No comments

Referred link:

http://www.tecmint.com/install-google-chrome-on-redhat-centos-fedora-linux/

Step 1: Enable Google YUM repository

Create a file called /etc/yum.repos.d/google-chrome.repo and add the following lines of code to it.

[google-chrome]
name=google-chrome
baseurl=http://dl.google.com/linux/chrome/rpm/stable/$basearch
enabled=1
gpgcheck=1
gpgkey=https://dl-ssl.google.com/linux/linux_signing_key.pub

Step 2: Installing Chrome Web Browser

Download and Install Chrome Web Browser with yum command. It will automatically install all dependencies.

# yum install google-chrome-stable

Update : Sadly, the Google Chrome browser no longer supports the most famous commercial distribution Red Hat and its free clones such as CentOS and Scientific Linux.

Yes, they’ve discontinued support for RHEL 6.X version as of Google Chrome and on other side, latest Firefox and Opera browsers run successfully on the same platforms.

Luckily, there is a script developed by Richard Lloyd, that automatically download and install latest Google Chrome browser by picking libraries from a more recent released distro and put those libraries in (/opt/google/chrome/lib) directory and then you can able to runGoogle Chrome on CentOS 6.X version.

# wget http://chrome.richardlloyd.org.uk/install_chrome.sh
# chmod u+x install_chrome.sh
# ./install_chrome.sh

This is trying upgrade OS.!!

for using Postgres-XL

Posted on 8/12/2014 01:13:00 오후 by Unknown with No comments

INSTALL
-- init each instances
initgtm -Z gtm -D /var/lib/pgxl/9.2/data_gtm
initdb -D /var/lib/pgxl/9.2/coord01 --nodename coord01
initdb -D /var/lib/pgxl/9.2/data01 --nodename data01
initdb -D /var/lib/pgxl/9.2/data02 --nodename data02

-- start each instances

gtm_ctl -Z gtm start -D /var/lib/pgxl/9.2/data_gtm

pg_ctl start -D /var/lib/pgxl/9.2/data01 -Z datanode -l logfile

pg_ctl start -D /var/lib/pgxl/9.2/data02 -Z datanode -l logfile

pg_ctl start -D /var/lib/pgxl/9.2/coord01 -Z coordinator -l logfile

-- referred to http://files.postgres-xl.org/documentation/index.html

DEBUG

-- if there are running to single, modify this.

port = 5432 ~ X

pooler_port = 6668 ~ Y

-- define the relation of each instances ( There are missed on manual. )

psql -c "EXECUTE DIRECT ON (coord01) 'CREATE NODE data01 WITH (TYPE = ''datanode'', HOST = ''localhost'', PORT = 5433)'" postgres

psql -c "EXECUTE DIRECT ON (coord01) 'CREATE NODE data02 WITH (TYPE = ''datanode'', HOST = ''localhost'', PORT = 5434)'" postgres

psql -c "EXECUTE DIRECT ON (data01) 'ALTER NODE data01 WITH (TYPE = ''datanode'', HOST = ''localhost'', PORT = 5433)'" postgres

psql -c "EXECUTE DIRECT ON (data01) 'CREATE NODE data02 WITH (TYPE = ''datanode'', HOST = ''localhost'', PORT = 5434)'" postgres

psql -c "EXECUTE DIRECT ON (data01) 'SELECT pgxc_pool_reload()'" postgres

psql -c "EXECUTE DIRECT ON (data02) 'CREATE NODE data01 WITH (TYPE = ''datanode'', HOST = ''localhost'', PORT = 5433)'" postgres

psql -c "EXECUTE DIRECT ON (data02) 'ALTER NODE data02 WITH (TYPE = ''datanode'', HOST = ''localhost'', PORT = 5434)'" postgres

psql -c "EXECUTE DIRECT ON (data02) 'SELECT pgxc_pool_reload()'" postgres

-- referred to http://sourceforge.net/p/postgres-xl/tickets/18/

로고 이미지

sample code for collecting a data.

Posted on 8/08/2014 04:24:00 오후 by Unknown with No comments

1. data collection using python

# python library for pulling data out of html or xml
# http://www.crummy.com/software/BeautifulSoup/bs4/doc/index.html
-- pulling_data.py
import codecs
import urllib2
from bs4 import BeautifulSoup

f = urllib2.urlopen('http://www.daum.net')
html_doc = f.read()

soup = BeautifulSoup(html_doc)
# for hangul
with codecs.open('result_daum.txt','w',encoding='utf8') as f:
for str in soup.body.strings :
f.write(str)

# soup.string get on all.
# soup.body.string get on a descendant of body.

2.ingest data and count each words and throw result to postgest

import codecs
import urllib2
from bs4 import BeautifulSoup

# get a site page
site = 'http://www.auction.co.kr'

f = urllib2.urlopen(site)
html_doc = f.read()

result = []

soup = BeautifulSoup(html_doc)

# get the level 2 deep's pages
for link in soup.find_all('a'):
link_tmp = link.get('href')
try:
f = urllib2.urlopen(link_tmp)
html_doc = f.read()
soup = BeautifulSoup(html_doc)
for str in soup.body.strings:
result.append(str)
except:
pass

# would be count as unique word
wordcount={}

for line in result:
for word in line.split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1

f.close()

#with codecs.open('get.txt','w',encoding='utf8') as f:
# for word,cnt in wordcount.items():
# f.write("%s %d\n" % (word,cnt))

# throwing the result to postgresql (ant)
import psycopg2

try:
conn = psycopg2.connect("dbname='ant' user='ant' host='zoo' password='ant'")
except:
print "I am unable to connect to the database"

cur = conn.cursor()

for word, cnt in wordcount.items():
cur.execute("INSERT INTO commerce(tm,site,lev,word,cnt) VALUES (now(),%s,2,%s, %s)", (site,word,cnt,) )

conn.commit()
conn.close()

3. using Twitter API with python tweepy library
import tweepy

consumer_key = '0EBFhXXXXXXXgcG9ouIGZ6l'
consumer_secret = 'spmIzBLO24MqEXXXXXXXXXXXX35K4FyUlLoAw'

access_token = '151351809-REDL20AXXXXXXXXXXXXXXGhsDRCjd9Y0jtrDH'
access_token_secret = 'lawyXXXXXXXXXXXXXXXXXXXlon0YmtwZTd'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

public_tweets = api.home_timeline()
for tweet in public_tweets:
print tweet.text

sample code to control the data in hadoop framework.

Posted on 8/08/2014 01:17:00 오후 by Unknown with No comments

1. flume

-- fox.conf
# Name the components on this agent
# fox -> zoo -> koala
agent.sinks = koala
agent.sources = fox
agent.channels = zoo

# Describe/configure the source
agent.sources.fox.type = spooldir
agent.sources.fox.spoolDir = /home/flume/dump

# Describe the sink
agent.sinks.koala.type = hdfs
agent.sinks.koala.hdfs.path = /flume/events
agent.sinks.koala.hdfs.fileType = DataStream
agent.sinks.koala.hdfs.writeFormat = Text
agent.sinks.koala.hdfs.rollSize = 0
agent.sinks.koala.hdfs.rollCount = 10000

# Use a channel which buffers events in memory
agent.channels.zoo.type = file

# Bind the source and sink to the channel
agent.sources.fox.channels = zoo
agent.sinks.koala.channel = zoo

-- beginning with configuration (fox.conf)
shell$ flume-ng agent --conf conf --conf-file fox.conf --name agent

2. hcatalog

hcat -e "create table koala (cnt bigint, wd string)"

3. pig

a = load '/flume/events/*';
b = foreach a generate flatten(TOKENIZE((chararray)$0)) as word;
c = group b by word;
d = foreach c generate COUNT(b) as cnt, group as wd;
store d into 'koala' using org.apache.hcatalog.pig.HCatStorer();

4. hive

select wd, cnt from koala order by cnt desc limit 10;

Popular Baby Names Top 50 since 1980

Posted on 8/08/2014 12:50:00 오후 by Unknown with No comments

I am studying about data analysis with R.
First I thought how many people used my name and else.

The word cloud showed me a best visualization.

The histogram plot about my name - Mark.

R code tested by R version 3.1.1, RStudio Version 0.98.978
# national poplular baby names
# url : http://www.ssa.gov/oact/babynames/limits.html
# national data .zip

# get a file list
setwd("C:/Users/Mark/Downloads/names")
files<-list.files()
files<-files[grepl(".txt",files)]
files<-files[files!="NationalReadMe.pdf"]

# import data to data frame
fox <- NULL
for (i in 1:length(files))
{
data <- read.csv(files[i], header=F)
data["year"] <- substr(files[i],4,7)
fox <- rbind(fox,data)
}

# assign column name to data frame
colnames(fox) <- c('name','gender','cnt','year')

# word count
library(sqldf)
koala <- sqldf("select name,sum(cnt) as cnt from fox group by name")

# drawing a word cloud
library(wordcloud)
wordcloud( as.character(koala$name),as.integer(koala$cnt),
scale=c(5,0.5), max.words=50, random.order=FALSE,
rot.per=0.35, use.r.layout=FALSE,
colors=brewer.pal(8, "Dark2"))

# drawing a histogram of some name as years
koala <- sqldf("select year,sum(cnt) as cnt from fox where name = 'Mark' group by year")
plot (koala, type = 'h', ylab = 'Baby Name (Mark)s Count', col = 'Purple')

M@D+R

데이터가 미래다

페이지