Monday, June 22, 2015

reset mysql root user (Mac OS 10.9)

completely stop mysql server
sudo /usr/local/mysql/support-files/mysql.server stop

 use sql with safe mode and change password

sudo mysqld_safe --skip-grant-tables &

 log in
mysql -u root
use mysql;

update user set password=PASSWORD("xxxx") where User='root';
flush privileges;

start mysql server
sudo /usr/local/mysql/support-files/mysql.server start

log in
mysql -u root -p
check users
select * from mysql.user;

Friday, June 19, 2015

Python unix time stamp to local timezone

import pytz
datetime.fromtimestamp(1352019599, pytz.timezone('America/Los_Angeles'))

Sunday, December 7, 2014

Apache Spark Day One - Installation

Install Spark with Mac OS10.9

1. install scala
For Mac OS, it pretty easy with brew

brew install scala
(The initial installation failed, and it was fixed by installing hadoop)

 go to the spark path
./sbt/sbt clean assembly

3. go to Mac setting, enable remote login
if all steps are done successfully, start the master by 

The GUI would be available at localhost:8080 

4. configuration
 - create at /conf/
  try 4 slaves: in the end of the file, add "export SPARK_WORKER_INSTANCES=4" to start 4 workers
   check the GUI, it shows the workers (you might need to input ssh keys for each worker).

./bin/ #stop all masters and slaves
./bin/ #start all masters and slaves

-  configure logs, conf/

5. now, it is all set up, test python scripts by 

Monday, October 13, 2014

Connet python to AWS -- boto

Boto is the python module to interface with Amazon Web Service. The docs list all AWS features work with Python 2.6 and 2.7. I start with S3 with python 2.7.

1. install boto (the easiest way is to use pip)

pip install boto

2. go to AWS and get access key and secret access key (set up both user and group)

3. run the example for testing

Before the run, change in the source code:
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

4. try basic s3 functions

from boto.s3.connection import S3Connection
from boto.s3.key import Key

#setup connection with AWS credentials

 #create bucket (you could think it as a cloud folder on S3)
b= s3.create_bucket('botobucket_1230')
  Note: bucket name has to be unique for all buckets on AWS (similar with url). That means you have to come up with a name that has not been taken by others.

 #create Key object that used to keep track of data stored in S3 (you could also think it as a filename)
k = Key(bucket)
k.key = 'testKey'

#write content from a string to a key (a file in a bucket)
k.set_contents_from_string('this is a test string')

#validate if a key exists in a bucket, if so, return the key object, otherwise return None
print b.get_key('testKey')
print b.get_key('testKey2')

#get all keys in the bucket
for k in b.list():
    print k

#copy content from / to a local file
source = 'path to localfile.txt'
source_size = os.stat(source).st_size # file size
print source, source_size
Note: for large data set, FileChunkIO module can help to chuck the original file into smaller segmentations. boto docs

Tuesday, August 19, 2014

linear programming -- simplex method

I used to like Linear Programming at school, but barely use it for problems in real world. However, it could probably offer some help to RTB problems. Well, forget about RTB, let's start with an easier example. (I make up this example for easy calculating. Of course it would be much more complicated in a real problem.)

We have three manufacture factories (M1, M2, M3) that can make two types of products: A and B. M1 and M2 can make product A; M1, M3 can make product B. To make each A, it needs 2 hours at M1, 4 hours at M2; tow make each B, it needs 2 hours at M1 and 5 hours at M3. The profit is $2 for each product A and $3 for each product B. However, M1 can afford 12 hrs at most a day; M2 can afford 16 hrs at most; M3 can afford 15 hrs at most. For making as much money as we can, how many product should we make for product A and B respectively each day?

factorieshr cost (product A)hr cost (product B)affordable hrs

If we put it in a math way:
objective:    max  2*x1 + 3*x2
 sub             2*x1 + 2* x2 <= 12
                   4*x1              <= 16
                               5*x2  <= 15

Of course it can transform into a min optimization based on LP duality.
Here I want to show the solution by simplex method. First, let's make all condition as equations by adding three variables x3, x4, x5
             2*x1 + 2* x2 + x3 = 12
             4*x1             + x4 = 16
                         5*x2  + x5 = 15

as we put them in a matrix and do the pivots
pivot 1:
the purpose is to find out the basic and nonbasic variable that can improve the solution
- step1:  find max in z_row (that can contribute to improve the solution). that is 3. so x2 is the basic variable (i.e. entering variable)
- step2  in the column (where z=3), for positive x(i,j) find min{x(i,j)/c}, that would be row_x5. so x5 is the nonbasic variable (i.e. departing variable)
detail of pivot1:
 - row_x5 = row_x5/5
 - row_x3 = row_x3 - row_x5 * 2
 - row_z = row_z - row_x5*3

after pivot1, x2 switched to the left as basic variable
same way, do pivot 2:

pivot 3

now, x1, x2 are both basic variables on the left, so the solution is: when x1 =3 and x2 =3, the problem have max value (Note: not all LP problems have feasible solution).

Finally, to make life easier, python has a module for the simplex method.
Here is an example of PyGLPK using simplex method
Other resource: PuLP

Tuesday, July 22, 2014

taste of python Requests

I barely use REST api with Python, but recently I found a great (and easy) Python library for HTTP/REST APIs, Python Requests.

It needs zero effort for installing, with pip or easy_install ("easy_install requests" for my MacOS)

First try with GET
 import requests  
 r = requests.get('')  

GET with parameter
 param = {‘user_id’:12345}  
 r = requests.get('’, param)  
 #to get the content of the response  
 print r.text  
 # it can also parse to json data  
 data = json.loads(r.text)  

check with status code
 print r.status_code  

for bad request, raise exceptions
 print r.raise_for_status()  

Session objects are pretty helpful
I used POST to carry to auth cookies through from authentication 
 s = requests.Session()  
 url = ‘’  
 info = {‘user’:”usename’, ‘password’:’mypassword’}  
 r =, data = info)  

I got an error at the first beginning 
 requests.exceptions.SSLError: hostname '' doesn't match 
 either of ‘', '’  

It seems the POST method checked the host’s SSL certificate. In this case we just need set the verify flag as False
 r =, data = info, verify=False) 

dah-dah!! it got through, then I can do GET, POST and DELETE to play with the data through the api

useful resource: handy cheat sheet for beginners

Thursday, July 10, 2014

D3 "translate", easier way to assign position

"transform"+ "translate" seems an easier way to assign components' positions in D3. It works the same way as (dx, dy). Here is am example of drawing circles.

usually, components are located with specified dx, dy

An much simpler version could be done with "translate"

both actually do the same job