Pandas is powerful analytic tools, but took long time running on extensive datasets.

PySpark is an python API for Spark architecture computing.

Enable python users to write pandas analytic functions while benefit from Spark speed up computing.

運用python flask架設輕便網頁伺服器, 並用 API 進行串接.

架設web server:

from flask import Flask # declare Flask object

app = Flask(__name__) # initialize Flask object
@app route(‘/’) 建立主網域下的 ’/’ 網址. 如: http://www.Lawrence.com/
def function(): # 在 app route之後定義的函數, 當該網域app被呼叫時就執行此function
return (‘Welcome to your first website!’)

執行 web server:

在 python中執行 app.run()

或在cmd中先設定FLASK_APP環境參數為py檔名稱

set FLASK_APP=程式碼名稱.py

並執行

flask run

即會在網域內打開 web server 觀看內容.

(如: http://131.111.111.111:3000/)

Use stackoverflow survey data to comprehend data analysis and build predictive model to explore data insights.

In this project, we aimed to deploy the Stackoverflow 2020 developer survey dataset to explore the following questions:

  1. Does higher income guarantee better job satisfication?
  2. What is the main factor that effects job satisfication and salary?
  3. Is there different intendency that best contribute to job satisfication and salary in different group(age, sex, race)?
  4. What is the most popular tools(database, develop platform, webplatform) used by the highest degree developers?

5. What is the mean time spent for developers to become the highest degree?

6. How often does most developers spend on learning?

Dataset:

The dataset used in this project is from Stackoverflow annual developer survey host in 2020: https://drive.google.com/file/d/1dfGerWeWkcyQ9GX9x20rdSGj7WtEpzBB/view?usp=sharing

In sum 64461 samples, containing 61 different questions relating personal features and developer questions.

create text file in directory:

with open(‘directory/text.txt’, ‘w+’) as file:

write text to the file:

file.write(‘I am a good person’)

Plot and visualization:

在matplotlib中印中文:

  1. 下載msj.ttf至matplotlib中文字資料夾
  2. 修改matplotlib 中font的configure, 加入 Microsoft JhengHei預設微軟正黑體

3. 在python程式碼中加入

plt.rcParams[‘font.sans-serif’] = [‘Microsoft JhengHei’]
plt.rcParams[‘axes.unicode_minus’] = False

物件偵測的label樣式:

Object detection model(Faster RCNN):

PascalVOC type:

-<object>

<name>missing</name>

<pose>Unspecified</pose>

<truncated>1</truncated>

<difficult>0</difficult>

-<bndbox>

<xmin>1459</xmin>

<ymin>2</ymin>

<xmax>1900</xmax>

<ymax>46</ymax>

</bndbox>

</object>

YOLO: [class_id, object_x, object_y, object_width, object_height]

1 0.876842 0.023370 0.246316 0.044565

In anaconda, we create virtual environment to build specific dependencies for different projects.

Create environment:

conda env create — — name myenv

Activate environment:

conda activate myenv

Deactivate environment:

decativate

Show all environment:

conda env list

Delete environment:

conda env remove — — name myenv

conda

Windows command line:

加路徑到環境變數:

In current session:

set PATH=directory;%PATH%

In permanent session:

setx PATH “directory;%PATH%”

To view the system path:

echo %PATH%

unzip file:

gunzip file_name

Copy file to directory:

cp file_name target_directory

Show all file summary in current path:

ll

Create new file folder in directory: (also available in windows)

mkdir folder_name

Communication with remote server:

via Putty

Connection: type the IP and connect -> enter user account and pwd

Upload files:

林彥良

Self enthusiastic data science learner

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store