5 simple ways to troubleshoot using Strace

I keep being surprised how few people are aware of all the things they can use strace for. It’s always one of the first debug tools I pull out, because it’s usually available on the Linux systems I run, and it can be used to troubleshoot such a wide variety of problems.

What is strace?

Strace is quite simply a tool that traces the execution of system calls. In its simplest form it can trace the execution of a binary from start to end, and output a line of text with the name of the system call, the arguments and the return value for every system call over the lifetime of the process.
But it can do a lot more:

  • It can filter based on the specific system call or groups of system calls
  • It can profile the use of system calls by tallying up the number of times a specific system call is used, and the time taken, and the number of successes and errors.
  • It traces signals sent to the process.
  • It can attach to any running process by pid.

If you’ve used other Unix systems, this is similar to “truss”. Another (much more comprehensive) is Sun’s Dtrace.

How to use it

This is just scratching the surface, and in no particular order of importance:

1) Find out which config files a program reads on startup

Ever tried figuring out why some program doesn’t read the config file you thought it should? Had to wrestle with custom compiled or distro-specific binaries that read their config from what you consider the “wrong” location?
The naive approach:

$ strace php 2>&1 | grep php.ini
open("/usr/local/bin/php.ini", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/php.ini", O_RDONLY) = 4
lstat64("/usr/local/lib/php.ini", {st_mode=S_IFLNK|0777, st_size=27, ...}) = 0
readlink("/usr/local/lib/php.ini", "/usr/local/Zend/etc/php.ini", 4096) = 27
lstat64("/usr/local/Zend/etc/php.ini", {st_mode=S_IFREG|0664, st_size=40971, ...}) = 0

So this version of PHP reads php.ini from /usr/local/lib/php.ini (but it tries /usr/local/bin first).
The more sophisticated approach if I only care about a specific syscall:

$ strace -e open php 2>&1 | grep php.ini
open("/usr/local/bin/php.ini", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/php.ini", O_RDONLY) = 4

The same approach work for a lot of other things. Have multiple versions of a library installed at different paths and wonder exactly which actually gets loaded? etc.

2) Why does this program not open my file?

Ever run into a program that silently refuse to read a file it doesn’t have read access to, but you only figured out after swearing for ages because you thought it didn’t actually find the file? Well, you already know what to do:

$ strace -e open,access 2>&1 | grep your-filename

Look for an open() or access() syscall that fails

3) What is that process doing RIGHT NOW?

Ever had a process suddenly hog lots of CPU? Or had a process seem to be hanging?
Then you find the pid, and do this:

root@dev:~# strace -p 15427
Process 15427 attached - interrupt to quit
futex(0x402f4900, FUTEX_WAIT, 2, NULL 
Process 15427 detached

Ah. So in this case it’s hanging in a call to futex(). Incidentally in this case it doesn’t tell us all that much – hanging on a futex can be caused by a lot of things (a futex is a locking mechanism in the Linux kernel). The above is from a normally working but idle Apache child process that’s just waiting to be handed a request.

But “strace -p” is highly useful because it removes a lot of guesswork, and often removes the need for restarting an app with more extensive logging (or even recompile it).

4) What is taking time?

You can always recompile an app with profiling turned on, and for accurate information, especially about what parts of your own code that is taking time that is what you should do. But often it is tremendously useful to be able to just quickly attach strace to a process to see what it’s currently spending time on, especially to diagnose problems. Is that 90% CPU use because it’s actually doing real work, or is something spinning out of control.

Here’s what you do:

root@dev:~# strace -c -p 11084
Process 11084 attached - interrupt to quit
Process 11084 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.59    0.001014          48        21           select
  2.89    0.000031           1        21           getppid
  2.52    0.000027           1        21           time
------ ----------- ----------- --------- --------- ----------------
100.00    0.001072                    63           total

After you’ve started strace with -c -p you just wait for as long as you care to, and then exit with ctrl-c. Strace will spit out profiling data as above.

In this case, it’s an idle Postgres “postmaster” process that’s spending most of it’s time quietly waiting in select(). In this case it’s calling getppid() and time() in between each select() call, which is a fairly standard event loop.

You can also run this “start to finish”, here with “ls”:

root@dev:~# strace -c >/dev/null ls
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 23.62    0.000205         103         2           getdents64
 18.78    0.000163          15        11         1 open
 15.09    0.000131          19         7           read
 12.79    0.000111           7        16           old_mmap
  7.03    0.000061           6        11           close
  4.84    0.000042          11         4           munmap
  4.84    0.000042          11         4           mmap2
  4.03    0.000035           6         6         6 access
  3.80    0.000033           3        11           fstat64
  1.38    0.000012           3         4           brk
  0.92    0.000008           3         3         3 ioctl
  0.69    0.000006           6         1           uname
  0.58    0.000005           5         1           set_thread_area
  0.35    0.000003           3         1           write
  0.35    0.000003           3         1           rt_sigaction
  0.35    0.000003           3         1           fcntl64
  0.23    0.000002           2         1           getrlimit
  0.23    0.000002           2         1           set_tid_address
  0.12    0.000001           1         1           rt_sigprocmask
------ ----------- ----------- --------- --------- ----------------
100.00    0.000868                    87        10 total

Pretty much what you’d expect, it spents most of it’s time in two calls to read the directory entries (only two since it was run on a small directory).

5) Why the **** can’t I connect to that server?

Debugging why some process isn’t connecting to a remote server can be exceedingly frustrating. DNS can fail, connect can hang, the server might send something unexpected back etc. You can use tcpdump to analyze a lot of that, and that too is a very nice tool, but a lot of the time strace will give you less chatter, simply because it will only ever return data related to the syscalls generated by “your” process. If you’re trying to figure out what one of hundreds of running processes connecting to the same database server does for example (where picking out the right connection with tcpdump is a nightmare), strace makes life a lot easier.

This is an example of a trace of “nc” connecting to www.news.com on port 80 without any problems:

$ strace -e poll,select,connect,recvfrom,sendto nc www.news.com 80
sendto(3, "\\24\\0\\0\\0\\26\\0\\1\\3\\255\\373NH\\0\\0\\0\\0\\0\\0\\0\\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("")}, 28) = 0
poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "\\213\\321\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\34\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "\\213\\321\\201\\200\\0\\1\\0\\1\\0\\1\\0\\0\\3www\\4news\\3com\\0\\0\\34\\0\\1\\300\\f"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("")}, [16]) = 153
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("")}, 28) = 0
poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "k\\374\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "k\\374\\201\\200\\0\\1\\0\\2\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1\\300\\f"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("")}, [16]) = 106
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("")}, 28) = 0
poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "\\\\\\2\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "\\\\\\2\\201\\200\\0\\1\\0\\2\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1\\300\\f"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("")}, [16]) = 106
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("")}, 16) = -1 EINPROGRESS (Operation now in progress)
select(4, NULL, [3], NULL, NULL)        = 1 (out [3])

So what happens here?

Notice the connection attempts to /var/run/nscd/socket? They mean nc first tries to connect to NSCD – the Name Service Cache Daemon – which is usually used in setups that rely on NIS, YP, LDAP or similar directory protocols for name lookups. In this case the connects fails.

It then moves on to DNS (DNS is port 53, hence the “sin_port=htons(53)” in the following connect. You can see it then does a “sendto()” call, sending a DNS packet that contains www.news.com. It then reads back a packet. For whatever reason it tries three times, the last with a slightly different request. My best guess why in this case is that www.news.com is a CNAME (an “alias”), and the multiple requests may just be an artifact of how nc deals with that.

Then in the end, it finally issues a connect() to the IP it found. Notice it returns EINPROGRESS. That means the connect was non-blocking – nc wants to go on processing. It then calls select(), which succeeds when the connection was successful.

Try adding “read” and “write” to the list of syscalls given to strace and enter a string when connected, and you’ll get something like this:

read(0, "test\\n", 1024)                 = 5
write(3, "test\\n", 5)                   = 5
poll([{fd=3, events=POLLIN, revents=POLLIN}, {fd=0, events=POLLIN}], 2, -1) = 1
read(3, "\"-//IETF//"..., 1024) = 216
write(1, "\"-//IETF//"..., 216) = 216

This shows it reading “test” + linefeed from standard in, and writing it back out to the network connection, then calling poll() to wait for a reply, reading the reply from the network connection and writing it to standard out. Everything seems to be working right.

Other ideas?

I’d love to hear from you if you use strace in particularly creative ways. E-mail me (vidar@hokstad.com) or post comments.



by 井底之蛙, mp.weixin.qq.comJanuary 21

微信号 afroginwell

功能介绍 把你的第一次留在这里,在这里可以也看到别人的第一次



  • 可用性的理解

1. 理解目标


  • 频率要低

1. 高可用性的设计



  • 时间要快



  • 几点经验











2.1 频率要低:减少出故障的次数。


2.2 时间要快:故障恢复时间要快















2013年开始,deal-service (商品系统)偶尔会因为某一次大流量(大促,常规活动)会挂,每几个月总有那么一次。基本上可用性就在39徘徊,这里订单和支付系统很稳定的,因为流量在商品详情页到订单有一个转化率,流量大了详情页就挂了,订单也就没有流量了,后来做了详情的静态化做得比较好了,能减少恢复的速度,能降级,但是deal-service的各个系统依赖太深了,还是不能保证整体端到端的可用性。


业务的增长在应用服务层面是可以扩容的,但是最大的单点,数据库是集中式的,这个阶段我们主要是把应用的数据访问在读写上分离,数据库提供更多的从库解决读的问题,但是写入仍然式最大的瓶颈(mysql的读可以扩展,写入QPS 也就小2万)

系统演变成下图这样:这个架构大约能支撑QPS 3000左右的订单量。






2)我们使用 cache的时候发现可用性在高峰期非常低,后来发现这个cache服务器跟公司监控系统cat服务器在一个机柜,高峰期的流量被cat跑了一大半,给业务的网络流量就非常少,就影响了业务。
















降级能力是跟产品一起来看的,需要看降级后,对用户的体验的影响,简单的比如,提示语是什么,比如我们支付渠道,如果支付宝渠道挂了,我们挂了50% ,我们支付宝的渠道是旁会自动出现一个提示,这个渠道可能不稳定,但是可以点击;当支付宝渠道挂了100% ,我们的按钮是灰色的,不能点击。也会有提示,比如换其他支付渠道(刚刚微信支付还挂了,就又起作用了)。另一个案例,我们在啊917大促的时候对某些依赖方,比如诚信的校验,这种如果判断比较耗资源,又可控的情况下,可以通过开关直接关闭或者启用。





从上到下评估流量,从下至上评估能力:发现一次订单提交 有20次数据库访问,读写比例高峰期是1:1,然后就跟进数据库的能力倒推系统应该放入的流量,然后做好前端的异步下单,让整个流量平缓的下放到数据库。

4. 降低发布风险

4.1 严格的发布流程


4.2 灰度机制




三、时间要快 :故障恢复时间要快







2)保证系统稳定吞吐 :系统如果能做好流量控制,容错,保证一个稳定的吞吐,能保证大部分场景的可用,也能很快的消化高峰流量,避免出现故障,产生流量的多次高峰。



1)告警的移动化 :系统可用性的告警应该全部用微信、短信这种能保证找到人的通信机制。


3)监控的可视化:我们的系统目前的要求是1分钟发现故障,3分钟定位故障:这就需要做好监控的可视化,在所有关键service里面的方法层面打点,然后做成监控曲线,不然3分钟定位到具体是那个地方出问题,比较困难。点评的监控系统cat能很好的提供这些指标变化,我们系统再这些基础上也做了一些更实时的能力,比如订单系统的我们的QPS 就是开发的秒级的监控曲线。



四 、几点经验








Original Page: http://mp.weixin.qq.com/s?__biz=MzIxMjE3NTg2NA==&mid=404115389&idx=1&sn=dc3d81583c4d2c3c07d6a563880235dd&scene=4#wechat_redirect

Shared from Pocket


北京联通的华为HG8346R是我接触的第一个光猫,因为2015年7月初刚刚进行了光改。原以为光纤安装完毕后就很简单,哪料到此光猫为路由模式了,客户只有普通用户权限,里面的设置就是供你看看,想加点常用的诸如端口映射、开UPnP等都是不行,PT下载就不给力了,于是决定破解。 去X宝上问了问,发现都说这个猫很难破解,只有一个人说可以,叫价80元。 本来我以前就是搞软件的,反正最近也不是很忙,求人不如求己。经过20多天的研究,终于搞定此光猫,同时还学习和掌握了较多的华为光猫的设置以及破解知识。
固件版本为 V300R013C10S112
① hw_ctree.xml 配置文件里面没有超级管理员账户,即使长按RESET恢复出厂模式后也是一样。 没有电信光猫那样恢复出厂模式后有 telecomadmin 超级管理员账户,也没有像其它地区联通光猫那样有 CUAdmin 账户。
-<X_HW_WebUserInfo NumberOfInstances="1">
<X_HW_WebUserInfoInstance InstanceID="1" Enable="1" Password="h559fvxj" ModifyPasswordFlag="0" UserLevel="1" UserName="user"/>
可以看到web登录只有一个 user 账户,用户密码为明文,并且没有更改过。
-<X_HW_WebUserInfo NumberOfInstances="1">
<X_HW_WebUserInfoInstance InstanceID="1" Enable="1" Password="5930f2851b57a9d83d341882b97f66480abbce1821fe0bb55a592412175045fa" PassMode="2" ModifyPasswordFlag="1" UserLevel="1" UserName="user"/>
可以看到web登录只有一个 user 账户,用户密码是用MD5+SHA256双重加密的,这是以后一段时间华为光猫配置文件解码出来的主流了。 MD5和SHA256是不能逆向破解的,只能暴力破解。所以如果密码的明文是复杂密码的话,想破解几乎不可能。
② 配置文件解密后,其中凡是存放密码的部分仍然是密文的形式存在的。因此也无法通过解密hw_ctree.xml方式后看到诸如语音鉴权密码等重要信息。
ManagementServer Password="$1I!xxEnCH.)l’k:*fsfL72g%!$" Username="cpe" X_HW_DSCP="0" X_HW_CertPassword="$16haQJY=:U’R`cLQi}aR>2g%!$" 
ConnectionRequestPassword="$17sy(~.fgSLvGQX3$2R<G2g%!$" ConnectionRequestUsername="acs"
+<SIP URI="+861063551238" AuthPassword="$15p’W*q[of4x|[dK*_RI+1g%!$" AuthUserName="+861063551238@bj.ims.chinaunicom.cn">
① 华为光猫ONT维修使能工具
由于北京联通华为HG8346R的固件是V300R013C10S112,因此随便在网上搜寻一下都可以找到2014年12月出的使能工具并下载(本论坛上也有,不过需要猫粮),就可以打开此光猫的Telnet。 而如果你的光猫已经是R013C10S121及以后和R015版本的固件的话,需要使用更新版本的ONT使能工具了。
② tftp32
用于备份配置文件,可以到: tftpd32.jounin.net 下载。 
③ Windows下的华为配置文件解密和加密工具
这个工具可以在Windows下对华为hw_ctree.xml进行处理,可以解密解压缩成为明码的形式,同时也可以加密压缩还原为原始的形式。相当于华为光猫中的 aescrypt2 命令。
④ su密码算号器 (R015版本不需要)
① 使用华为光猫ONT使能工具打开Telnet
② 进入Telnet后 (用户名:root 密码:admin),使用backup cfg 命令备份出自己光猫的配置文件 hw_ctree.xml (配合tftp32)
⒈ 光猫破解完成后,将稍加修改(主要是超级用户的密码部分)的配置文件重新导入回光猫里面,使光猫恢复原先光猫里面所有的设置,因此再不用担心语音不行、ITV不行、上网不行了。
⒉ 你要是自己换猫的时候,如果更换的也是华为基本同类型的猫的话,直接将原来的配置文件导入新猫里面去,原来猫里面的所有设置一下就在新猫里面设置好了。
③ 得到配置文件后可以使用SU命令,然后输入su的密码提权后,通过一些命令将光猫恢复为华为界面。
④ 使用hw_ctree.xml配置文件解密工具将其解码后,将其中的UserLevel="1"修改为 UserLevel="0" ,即将user用户变更为管理员权限用户
<X_HW_WebUserInfoInstance InstanceID="1" Enable="1" Password="h559fvxj" ModifyPasswordFlag="0" UserLevel="0" UserName="user"/>
其中的Password项也可以修改为自己需要的密码。 然后再使用配置文件工具将配置文件加密压缩还原成为华为光猫里面的状态,从而便于今后导入光猫中。
其中C2为配置文件加密解密工具(命令行使用), guo.xml 是从光猫导出的hw_ctree.xml 配置文件, 通过C2工具先解码成为 guojiemi.xml 然后用记事本修改后,再通过c2工具加密还原成为guojiami.xml文件。

新的光猫安装的是 V300R015C10S109 或 V300R015C10S111 版本的固件,用同样的方法也可以搞定。

常用 Git 命令清单

我每天使用 Git ,但是很多命令记不住。


下面是我整理的常用 Git 命令清单。几个专用名词的译名如下。

  • Workspace:工作区
  • Index / Stage:暂存区
  • Repository:仓库区(或本地仓库)
  • Remote:远程仓库


# 在当前目录新建一个Git代码库
$ git init

# 新建一个目录,将其初始化为Git代码库
$ git init [project-name]

# 下载一个项目和它的整个代码历史
$ git clone [url]



# 显示当前的Git配置
$ git config --list

# 编辑Git配置文件
$ git config -e [--global]

# 设置提交代码时的用户信息
$ git config [--global] user.name "[name]"
$ git config [--global] user.email "[email address]"


# 添加指定文件到暂存区
$ git add [file1] [file2] ...

# 添加指定目录到暂存区,包括子目录
$ git add [dir]

# 添加当前目录的所有文件到暂存区
$ git add .

# 删除工作区文件,并且将这次删除放入暂存区
$ git rm [file1] [file2] ...

# 停止追踪指定文件,但该文件会保留在工作区
$ git rm --cached [file]

# 改名文件,并且将这个改名放入暂存区
$ git mv [file-original] [file-renamed]


# 提交暂存区到仓库区
$ git commit -m [message]

# 提交暂存区的指定文件到仓库区
$ git commit [file1] [file2] ... -m [message]

# 提交工作区自上次commit之后的变化,直接到仓库区
$ git commit -a

# 提交时显示所有diff信息
$ git commit -v

# 使用一次新的commit,替代上一次提交
# 如果代码没有任何新变化,则用来改写上一次commit的提交信息
$ git commit --amend -m [message]

# 重做上一次commit,并包括指定文件的新变化
$ git commit --amend [file1] [file2] ...


# 列出所有本地分支
$ git branch

# 列出所有远程分支
$ git branch -r

# 列出所有本地分支和远程分支
$ git branch -a

# 新建一个分支,但依然停留在当前分支
$ git branch [branch-name]

# 新建一个分支,并切换到该分支
$ git checkout -b [branch]

# 新建一个分支,指向指定commit
$ git branch [branch] [commit]

# 新建一个分支,与指定的远程分支建立追踪关系
$ git branch --track [branch] [remote-branch]

# 切换到指定分支,并更新工作区
$ git checkout [branch-name]

# 建立追踪关系,在现有分支与指定的远程分支之间
$ git branch --set-upstream [branch] [remote-branch]

# 合并指定分支到当前分支
$ git merge [branch]

# 选择一个commit,合并进当前分支
$ git cherry-pick [commit]

# 删除分支
$ git branch -d [branch-name]

# 删除远程分支
$ git push origin --delete [branch-name]
$ git branch -dr [remote/branch]


# 列出所有tag
$ git tag

# 新建一个tag在当前commit
$ git tag [tag]

# 新建一个tag在指定commit
$ git tag [tag] [commit]

# 查看tag信息
$ git show [tag]

# 提交指定tag
$ git push [remote] [tag]

# 提交所有tag
$ git push [remote] --tags

# 新建一个分支,指向某个tag
$ git checkout -b [branch] [tag]


# 显示有变更的文件
$ git status

# 显示当前分支的版本历史
$ git log

# 显示commit历史,以及每次commit发生变更的文件
$ git log --stat

# 显示某个文件的版本历史,包括文件改名
$ git log --follow [file]
$ git whatchanged [file]

# 显示指定文件相关的每一次diff
$ git log -p [file]

# 显示指定文件是什么人在什么时间修改过
$ git blame [file]

# 显示暂存区和工作区的差异
$ git diff

# 显示暂存区和上一个commit的差异
$ git diff --cached [file]

# 显示工作区与当前分支最新commit之间的差异
$ git diff HEAD

# 显示两次提交之间的差异
$ git diff [first-branch]...[second-branch]

# 显示某次提交的元数据和内容变化
$ git show [commit]

# 显示某次提交发生变化的文件
$ git show --name-only [commit]

# 显示某次提交时,某个文件的内容
$ git show [commit]:[filename]

# 显示当前分支的最近几次提交
$ git reflog


# 下载远程仓库的所有变动
$ git fetch [remote]

# 显示所有远程仓库
$ git remote -v

# 显示某个远程仓库的信息
$ git remote show [remote]

# 增加一个新的远程仓库,并命名
$ git remote add [shortname] [url]

# 取回远程仓库的变化,并与本地分支合并
$ git pull [remote] [branch]

# 上传本地指定分支到远程仓库
$ git push [remote] [branch]

# 强行推送当前分支到远程仓库,即使有冲突
$ git push [remote] --force

# 推送所有分支到远程仓库
$ git push [remote] --all


# 恢复暂存区的指定文件到工作区
$ git checkout [file]

# 恢复某个commit的指定文件到工作区
$ git checkout [commit] [file]

# 恢复上一个commit的所有文件到工作区
$ git checkout .

# 重置暂存区的指定文件,与上一次commit保持一致,但工作区不变
$ git reset [file]

# 重置暂存区与工作区,与上一次commit保持一致
$ git reset --hard

# 重置当前分支的指针为指定commit,同时重置暂存区,但工作区不变
$ git reset [commit]

# 重置当前分支的HEAD为指定commit,同时重置暂存区和工作区,与指定commit一致
$ git reset --hard [commit]

# 重置当前HEAD为指定commit,但保持暂存区和工作区不变
$ git reset --keep [commit]

# 新建一个commit,用来撤销指定commit
# 后者的所有变化都将被前者抵消,并且应用到当前分支
$ git revert [commit]


# 生成一个可供发布的压缩包
$ git archive



10分钟学会理解和解决MySQL乱码问题 • cenalulu’s Tech Blog

cenalulu's Tech Blog





在阅读本文之前,强烈建议对字符集编码概念还比较模糊的同学 阅读下博主之前对相关概念的一篇科普:十分钟搞清字符集和字符编码




上图中有3次编码/解码的过程(红色箭头)。三个红色箭头分别对应:客户端编码,MySQL Server解码,Client编码向表编码的转换。其中Terminal可以是一个Bash,一个web页面又或者是一个APP。本文中我们假定Bash是我们的Terminal,即用户端的输入和展示界面。图中每一个框格对应的行为如下:

  • 在terminal中使用输入法输入
  • terminal根据字符编码转换成二进制流
  • 二进制流通过MySQL客户端传输到MySQL Server
  • Server通过character-set-client解码
  • 判断character-set-client和目标表的charset是否一致
  • 如果不一致则进行一次从client-charset到table-charset的一次字符编码转换
  • 将转换后的字符编码二进制流存入文件中


上图有3次编码/解码的过程(红色箭头)。上图中三个红色箭头分别对应:客户端解码展示,MySQL Server根据character-set-client编码,表编码向character-set-client编码的转换。

  • 从文件读出二进制数据流
  • 用表字符集编码进行解码
  • 将数据转换为character-set-client的编码
  • 使用character-set-client编码为二进制流
  • Server通过网络传输到远端client
  • client通过bash配置的字符编码展示查询结果


1. 存入和取出时对应环节的编码不一致

这个会造成乱码是显而易见的。我们把存入阶段的三次编解码使用的字符集编号为C1,C2,C3(图一从左到右);取出时的三个字符集依次编号为C1’,C2’,C3’(从左到右)。那么存入的时候bash C1用的是UTF-8编码,取出的时候,C1'我们却使用了windows终端(默认是GBK编码),那么结果几乎一定是乱码。又或者存入MySQL的时候set names utf8(C2),而取出的时候却使用了set names gbk(C2'),那么结果也必然是乱码

2. 单个流程中三步的编码不一致



master [localhost] {msandbox} (test) > create table charset_test_utf8 (id int primary key auto_increment, char_col varchar(50)) charset = utf8;
Query OK, 0 rows affected (0.04 sec)

master [localhost] {msandbox} (test) > set names gbk;
Query OK, 0 rows affected (0.00 sec)

master [localhost] {msandbox} (test) > insert into charset_test_utf8 (char_col) values ('中文');
Query OK, 1 row affected, 1 warning (0.01 sec)

master [localhost] {msandbox} (test) > show warnings;
| Level   | Code | Message                                                                   |
| Warning | 1366 | Incorrect string value: '\xAD\xE6\x96\x87' for column 'char_col' at row 1 |
1 row in set (0.00 sec)

master [localhost] {msandbox} (test) > select id,hex(char_col),char_col from charset_test_utf8;
| id | hex(char_col)  | char_col |
|  1 | E6B6933FE69E83 | ?        |
1 row in set (0.01 sec)



  • Client to Server的编解码的原因是MySQL需要对传来的二进制流做语法和词法解析。如果不做编码解析和校验,我们甚至没法知道传来的一串二进制流是insert还是update
  • File to Engine的编解码是为知道二进制流内的分词情况。举个简单的例子:我们想要从表里取出某个字段的前两个字符,执行了一句形如select left(col,2) from table的语句,存储引擎从文件读入该column的值是E4B8ADE69687。那么这个时候如果我们按照GBK把这个值分割成E4B8,ADE6,9687三个字,并那么返回客户端的值就应该是E4B8ADE6;如果按照UTF8分割成E4B8AD,E69687,那么就应该返回E4B8ADE69687两个字。可见,如果在从数据文件读入数据后,不进行编解码的话在存储引擎内部是无法进行字符级别的操作的。






那么什么是有损转换,什么是无损转换呢?假设我们要把用编码A表示的字符X,转化为编码B的表示形式,而编码B的字形集中并没有X这个字符,那么此时我们就称这个转换是有损的。那么,为什么会出现两个编码所能表示字符集合的差异呢?如果大家看过博主之前的那篇 十分钟搞清字符集和字符编码,或者对字符编码有基础理解的话,就应该知道每个字符集所支持的字符数量是有限的,并且各个字符集涵盖的文字之间存在差异。UTF8和GBK所能表示的字符数量范围如下

  • GBK单个字符编码后的取值范围是:8140FEFE 其中不包括**7E,总共字符数在27000左右
  • UTF8单个字符编码后,按照字节数的不同,取值范围如下表:


字符信息具体是:ਅ GURMUKHI LETTER A Unicode: U+0A05, UTF-8: E0 A8 85


master [localhost] {msandbox} (test) > create table charset_test_gbk (id int primary key auto_increment, char_col varchar(50)) charset = gbk;
Query OK, 0 rows affected (0.00 sec)

master [localhost] {msandbox} (test) > set names utf8;
Query OK, 0 rows affected (0.00 sec)

master [localhost] {msandbox} (test) > insert into charset_test_gbk (char_col) values ('ਅ');
Query OK, 1 row affected, 1 warning (0.01 sec)

master [localhost] {msandbox} (test) > show warnings;
| Level   | Code | Message                                                               |
| Warning | 1366 | Incorrect string value: '\xE0\xA8\x85' for column 'char_col' at row 1 |
1 row in set (0.00 sec)

master [localhost] {msandbox} (test) > select id,hex(char_col),char_col,char_length(char_col) from charset_test_gbk;
| id | hex(char_col) | char_col | char_length(char_col) |
|  1 | 3F            | ?        |                     1 |
1 row in set (0.00 sec)




  • 被转换的字符是否同时在两个字符集中
  • 目标字符集是否能够对不支持字符,保留其原有表达形式



master [localhost] {msandbox} (test) > create table charset_test (id int primary key auto_increment, char_col varchar(50)) charset = latin1;
Query OK, 0 rows affected (0.03 sec)

master [localhost] {msandbox} (test) > set names latin1;
Query OK, 0 rows affected (0.00 sec)

master [localhost] {msandbox} (test) > insert into charset_test (char_col) values ('中文');
Query OK, 1 row affected (0.01 sec)

master [localhost] {msandbox} (test) > select id,hex(char_col),char_col from charset_test;
| id | hex(char_col) | char_col |
|  2 | E4B8ADE69687  | 中文     |
2 rows in set (0.00 sec)

具体流程图如下。可见在被MySQL Server接收到以后实际上已经发生了编码不一致的情况。但是由于Latin1字符集对于自己表述范围外的字符不会做任何处理,而是保留原值。这样的行为也使得错进错出成为了可能。


理解了上面的内容,要避免乱码就显得很容易了。只要做到“三位一体”,即客户端,MySQL character-set-client,table charset三个字符集完全一致就可以保证一定不会有乱码出现了。而对于已经出现乱码,或者已经遭受有损转码的数据,如何修复相对来说就会有些困难。下一节我们详细介绍具体方法。




无论从语法还是字面意思来看:ALTER TABLE ... CHARSET=xxx 无疑是最像包治乱码的良药了!而事实上,他对于你已经损坏的数据一点帮助也没有,甚至连已经该表已经创建列的默认字符集都无法改变。我们看下面这个例子

master [localhost] {msandbox} (test) > show create table charset_test;
| Table        | Create Table                   |
| charset_test | CREATE TABLE `charset_test` (
  `char_col` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`id`)
1 row in set (0.00 sec)

master [localhost] {msandbox} (test) > alter table charset_test charset=gbk;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

master [localhost] {msandbox} (test) > show create table charset_test;
| Table        | Create Table                   |
| charset_test | CREATE TABLE `charset_test` (
  `char_col` varchar(50) CHARACTER SET latin1 DEFAULT NULL,
  PRIMARY KEY (`id`)
1 row in set (0.00 sec)



ALTER TABLE … CONVERT TO CHARACTER SET … 的相较于方法一来说杀伤力更大,因为从 官方文档的解释 他的作用就是用于对一个表的数据进行编码转换。下面是文档的一小段摘录:

To change the table default character set and all character columns (CHAR, VARCHAR, TEXT) to a new character set, use a statement like this:

ALTER TABLE tbl_name

CONVERT TO CHARACTER SET charset_name [COLLATE collation_name];

而实际上,这句语法只适用于当前并没有乱码,并且不是通过错进错出的方法保存的表。。而对于已经因为错进错出而产生编码错误的表,则会带来更糟的结果。我们用一个实际例子来解释下,这句SQL实际做了什么和他会造成的结果。假设我们有一张编码是latin1的表,且之前通过错进错出存入了UTF-8的数据,但是因为通过terminal仍然能够正常显示。即上文错进错出章节中举例的情况。一段时间使用后我们发现了这个错误,并打算把表的字符集编码改成UTF-8并且不影响原有数据的正常显示。这种情况下使用alter table convert to character set会有这样的后果:

master [localhost] {msandbox} (test) > create table charset_test_latin1 (id int primary key auto_increment, char_col varchar(50)) charset = latin1;
Query OK, 0 rows affected (0.01 sec)

master [localhost] {msandbox} (test) > set names latin1;
Query OK, 0 rows affected (0.00 sec)

master [localhost] {msandbox} (test) > insert into charset_test_latin1 (char_col) values ('这是中文');
Query OK, 1 row affected (0.01 sec)

master [localhost] {msandbox} (test) > select id,hex(char_col),char_col,char_length(char_col) from charset_test_latin1;
| id | hex(char_col)            | char_col     | char_length(char_col) |
|  1 | E8BF99E698AFE4B8ADE69687 | 这是中文     |                    12 |
1 row in set (0.01 sec)

master [localhost] {msandbox} (test) > alter table charset_test_latin1 convert to character set utf8;
Query OK, 1 row affected (0.04 sec)
Records: 1  Duplicates: 0  Warnings: 0

master [localhost] {msandbox} (test) > set names utf8;
Query OK, 0 rows affected (0.00 sec)

master [localhost] {msandbox} (test) > select id,hex(char_col),char_col,char_length(char_col) from charset_test_latin1;
| id | hex(char_col)                                          | char_col                    | char_length(char_col) |
|  1 | C3A8C2BFE284A2C3A6CB9CC2AFC3A4C2B8C2ADC3A6E28093E280A1 | 这是中文                |                    12 |
1 row in set (0.00 sec)


正确的方法一 Dump & Reload


  1. 通过错进错出的方法,导出到文件
  2. 用正确的字符集修改新表
  3. 将之前导出的文件导回到新表中


shell> mysqldump -u root -p -t --skip-set-charset --default-character-set=utf8 test charset_test_latin1 > data.sql
shell> mysql -uroot -p -e 'create table charset_test_latin1 (id int primary key auto_increment, char_col varchar(50)) charset = utf8' test
shell> mysql -uroot -p  --default-character-set=utf8 test < data.sql

正确的方法二 Convert to Binary & Convert Back



mysql> ALTER TABLE charset_test_latin1 MODIFY COLUMN char_col VARBINARY(50);
mysql> ALTER TABLE charset_test_latin1 MODIFY COLUMN char_col varchar(50) character set utf8;


  1. http://www.psce.com/blog/2015/03/03/mysql-character-encoding-part-2/
  2. http://www.qqxiuzi.cn/zh/hanzi-gbk-bianma.php
  3. http://zh.wikipedia.org/wiki/%E6%B1%89%E5%AD%97%E5%86%85%E7%A0%81%E6%89%A9%E5%B1%95%E8%A7%84%E8%8C%83
  4. http://www.qqxiuzi.cn/zh/hanzi-gbk-bianma.php
  5. http://blog.csdn.net/ws84643557/article/details/6905167

Written by cenalulu(卢钧轶)

Share on Twitter

Share on Facebook
Share on Google+

Updated March 05, 2015



© 2015 cenalulu's Tech Blog powered by Jekyll + Skinny Bones.

Zeal’s Blog · 关于钻石的选择


April 11, 2007 in Misc by zeal | 15 comments


免责声明: 本文只是本人的一点点经验之谈,仅供读者参考。本文内容不具备任何意义上的权威性,也不保证准确性。本文作者不承担任何因阅读本文而产生损失的责任。



钻石的指标,最最出名的就是”四C”了。这“四C”分别是指, Color, Cut, Clarity, 和Carat。

Carat就是中文里说得克拉了,是钻石重量的量度。1 carat等于200mg。在没有试图了解钻石之前,一般大家的印象就是越大越好了,也就是carat越高越好。这个是没错的,但远不足以用来衡量一块钻石的好坏。下面我们来说说其他三个C。Color和Clarity是钻石的天然属性,Cut是后天的加工。


  • Colorless: D, E, F, 这是最好的一个档,说明钻石是无色的。D是绝对最好的,但是E, F和D之间的区别,普通人基本无法看出来。
  • Near Colorless: G, H, I, J, 这一档也还是不错,肉眼在通常的观察条件下也多办会觉得是colorless的。G, H只有故意和D,E之类的放在一起放在白纸上比较,你才能觉出区别来。I, J可能能直接看出来
  • K-M,N-Z就基本不纳入考虑范围内,因为你直接能看出Yellow来。如果不是Engagement ring,而是用来平时玩的,可能黄色的钻石也会产生某种fancy的效果。


  • FL, IF, (Flawless, Internally Flawless) 是最好的,可以认为是纯净的, Flawless
  • VVS1, VVS2,仅仅有非常非常少量的杂质(very very slightly), 也可以认为是具有很好clarity的钻石
  • VS2, VS2是very slightly, 比上面两档便宜,但是肉眼是无法看到其中的杂质的。也就是说,对普通观测条件是没有区别的。
  • SI1, SI2, 杂质在10x放大镜下就能看见,眼尖的人也可能直接看出来
  • I1, I2, I3,是不推荐的,可以说是次品了。

Color和Clarity是钻石的天然属性了,等级分得很细,不同等级之间会有相当的价格差异。比如我就发现VVS1和VVS2之间有$800-$1000左右的价差。但是,从前面的介绍,我们也可以看到, 对于日常佩戴,Color上F是不会和D, E有任何观测得到的区别的,G也不会有很大的差别,就是unnoticable。但是价格上就有很大的差别。同样,VVS2,或者VS1已经足够好了,他们和FI, IF, VVS1之间的差别也是基本看不出来的,但是价格差别很大。对普通工薪族来说,没有必要将钱花在追求这种写在纸上的指标。另外,一个更重要的原因,就是四C中的Cut。

Cut是人为对钻石的后天加工,它没有Color或者Clarity那样精细区分的指标差别。Cut是两方面,一个是形状(shape),一个就是切工了。形状方面,大家各自挑选自己喜欢的,不存在谁好谁坏了。但是据我观察,Engagement ring,相当多人的是选择最简单的圆形(round)。

在钻石的所有指标中,我个人认为最最最重要的还是切工。钻石的精美主要体现在对光线的反射造成的效果。而这个,在Color和Clarity还行的情况下,基本上是由cut的水准来决定的。一块Color是D, Clarity是FL的钻石,如果cut得不好,绝对没有一块G, VS1但是cut得很好的钻石看上去漂亮。这就是刚才说道的为什么不值得花钱去追求极品的color和clarity。如果想要钱花在刀刃上的话,cut是最重要的。那么如何衡量cut的好坏呢?眼光好的同学可以自己去挑,如果您相信自己的话。如果你是那种到了店里看见几十上百个钻戒在一起,一下子就失去了鉴别能力,或者你是在internet store上买的话,你还是得靠指标。但是刚才也说了,cut不像color和clarity那样有个公认的量度,该怎么办呢?

我个人认为作为普通消费者只能相信第三方的认证机构了。也就是所谓的钻石的certificate。通常在正规地方卖的钻石,都会有相应的certificate。有两家检验钻石并发certificate机构比较权威,是GIAAGS(有的商家也管这个叫AGSL)。这两家的证书里包含了钻石的许多信息,比如刚刚说过的carat, color, clarity, 还有序列号阿,等等。

工艺上包括很多个方面,这两家关于Cut都有一个笼统的评价,就叫cut, 等级划分GIA和AGS的略有不同

  • 从好到坏,GIA用的Excellent, Very Good, Good, Fair, Poor
  • AGS用的是Ideal (0), Excellent (1), Very Good (2), Good(3,4), Fair(5,6,7), Poor(8,9,10)

上面我说过,cut一定要尽可能选好的,所以如果购买的钻石用GIA的话就要Excellent,用AGS的话就要Ideal 0。

除了这个笼统的评价外,还有其他几项关于工业的指标。AGS在这方面比GIA要更细致。有种说法(未经考证), AGS比GIA更挑工艺,如果你特别在意切工的话,应该选择有AGS certificate的钻石。

GIA中两个衡量Finish(我翻译为完成度)指标叫做Polish, Symmetry,量度和cut应该是一样的,所以要尽量向Excellent靠拢。目前的AGS认证在cut这一项上从了刚才说过的总分之外,还有一些细分的指标,有三项, 分别是Light performance, Proportion factors, 和Finish。Finish下面再分为Polish和Symmetry。Polish和Symmetry的量度应该和cut一样,用AGS的话,尽量选择Ideal。AGS会给Light performance, Proportion factors, 和Finish给出分值。分数是从0(最好, ideal)到10(最差,poor)。理想状态就是这三项都是0,这就是被AGS称为的”Triple Zero”, 最好的cut了。也是在购买的时候要尽量争取的。Cut中这个proportion,就是比例很重要。而AGS的认证你可以通过读Proportion factor来衡量优异程度。GIA貌似没有这个指标。但是AGS和GIA都附有图,可以从上面读出钻石外形上各个部分的比例。其中最重要的两项是depth和table。据某salesman说, depth最好在58%-62%之间,table最好在53%-57%之间。此说法我没有考证。

另有一个指标叫Fluorescence, 中文好像翻译为荧光反应,指钻石在UV(强烈紫外线)下发出的蓝光的强度。GIA指标中Fluorescence的级别由高到低分别是none, faint, medium, strong, or very strong。选择none就是了。关于Fluorescence, AGS的认证是Negligible, Medium Blue, Strong Blue, Very Strong Blue。要选择Negligible。

总之,我个人认为,carat看你的经济条件了,color上F肯定足够了, G也行,clarity上vvs2或者vs1都应该行,而cut一定要最好的,非专家的我们如何衡量cut,可以靠第三方认证的certificate,在cut相关的每一项上都选择最好的那种。

另外,要不要买名牌?如TiffanyCartier等著名品牌。这是一个仁者见仁智者见智的问题了,根据喜好了。我们曾经比较过Tiffany,Cartier这样的名牌和Bluenile(一个internet store),大概同指标下的钻石,品牌的可能要贵上几千美元。Internet store相对于店里的好处是可选择余地大得多,选择界面友好得多,拿着鼠标拖来脱去就可以圈定你的选购范围。但是店里的好处就是能看见实物,有主观的感受,而且可以选择适合自己的戒指的setting。另外在店里买,在如何finance这个问题上也有比较多的商量余地。

据我一个在Tiffany买了钻戒的朋友说, Tiffany并不使用GIA或者AGS认证,而是采用自己的证书(品牌啊!!)。你如果去Tiffany店里,问到Cut这一项的时候,Tiffany的员工会告诉你,他们的所有钻石在cut上都是最好的。如果有心买Tiffany的话,可以在找找有没有关于Tiffany的certificate的指标介绍。

另外,一些商家,如bluenile,他们提供GIA或者AGS,但是他们自己也会对所有钻石再进行一次自己的检验(主要是看cut),选择出他们认为最好的。比如bluenile,会在GIA或者AGS认为的cut是excellent/idea的钻石中再选出他们认为其中最好的一部分,称之为Signature-Ideal。并在GIA/AGS之外再出具一份certificate,叫做GCAL report。GCAL估计是bluenile的一家合作的钻石验证公司。GCAL的report中又包含了一些AGS/GIA中没有的信息。最重要的是Optical Light Performance Profile。其中有两项测试,一个是Optical Brillance Analysis,另一个是Optical Symmetry Ananlysis。指标都是从Excellent, Very good, good,到fair。如果两项都是Excellent的话,钻石的light performance将会非常出色,也意味着会很漂亮,很光彩夺目。



[tags]Diamond, Engagement Ring, AGS, GIA[/tags]

Tags: , , , , ,

Growth@Airbnb,从数据危机开始 – V2EX

V2EX  ›  程序员



  •   valleyvagabond · 21 分钟前 · 18 次点击  


    现在我还依稀记得2012年加入Airbnb时候的样子。没有现在如此高大上的办公楼,没有全球的房东大会,北京对我们来说还只是地图上个的一个标记。由工程师、设计师和产品经理组成的团队大约有50个人, 管理扁平到所有的工程师直接向创始人兼CTO Nate汇报。团队的划分是流动的,跟着项目走。每个季度甚至某个月如果需要做某件事情,那就会在24小时内临时组成一个团队去做某个项目。直到项目做完,这个临时团队解散。这样看起来有些混乱的管理方式,却支撑着公司经历了最为高速增长的岁月。这种管理方式为团队提供了极大的灵活性,也极大的激励了他们的创造力。有时我在想,如果你在走一条别人没有走过的路,不断快速地尝试各种可能也许是那个时段很合理的growth hacking方式。



    为了改变现状,我们做的第一件事情就是让数据收集变得不费吹灰之力。目标是使得每个应用开发工程师能从一个项目开始就自觉地收集数据。很快我们开发了第一版系统。这个数据收集系统由3部分组成–日志组件,日志服务器,数据pipeline。日志组件是一个对底层日志服务器所提供服务的一个封装,它提供一个简单而通用的接口。通过它,应用开发工程师只需一行代码就能记录他想要记录的事件,无需关心日志服务器在哪,日志存在哪里,出错怎么办。通过简单易用的日志组件,我们统一了网站和移动端的数据收集。而日志服务器是一个小集群,他们是分布式的,每个应用服务器会自动找到一个正常工作的日志服务器,并通过REST API来传递日志。数据最终被日志服务器存储到AWS的S3里,然后由EMR上运行的数据pipeline来进行各种处理,最后导出到传统的SQL数据库供分析使用。这个系统工作了一段时间,越来越多的团队开始使用它,而随之而来数据压力也暴露它在设计上的不足。小量的掉数据事件时有发生,而且排查起来也十分困难。好在这个时候由LinkedIn主导开发的Kafka发布了最新的0.8版,这个一度由于没有很强的容错能力而被内部团队否定的系统,随着新版本的大幅改进的再次进入我的视线。我极力主张基于Kafka来开发新版本的日志服务系统。于是我们很快开发了一版基于Kafka的系统,事后的结果证明,无论从性能还是可靠度来看,新系统比老的系统好了一个数量级。再也没有数据丢失发生,而且系统的运营也变得极为简单。

    除了在数据收集上的挑战,我们在数据分析工具上也遇到了各种问题。最早的数据分析全由CTO Nate一个人在一台SQL服务器上进行。后来数据科学家团队开始建立,大家依旧沿袭了在一台SQL上做分析的习惯。随着需要使用数据的团队越来越多,而且大家在写Query时的各种漫不经心,导致RDS不堪重负,死锁时常发生。为了解决这些问题,并保证分析工具跟上数据和用户的增长,我们开始着手搭建自己的大数据平台。第一版的大数据平台很简单,基本上是基于EMR的接口进行了一个接单封装,然后数据科学家们用一个crontab来调度数据处理任务。很显然,这种方式有着诸多问题。第一,由于数据需要在S3和EMR的HDFS之间倒腾,I/O的代价非常昂贵。第二,如果crontab里的某个任务失败,而该任务是一系列的任务中的一个, 我们不得不从头执行这个序列。第三,EMR是个黑盒子,排错很困难。

    痛定思痛,我们决定基于Mesos来搭建我们的大数据平台。不熟悉Mesos的朋友可以把它想象成为一个Linux服务器集群的操作系统–集群对使用者来说如同一台服务器,而Mesos在集群内部处理各种资源的调度和任务的执行。之所以选择Mesos,第一是由于他给分布式服务的将来绘制了一个十分美好的未来(尽管我们在使它的时候它还十分地不成熟),第二是由于团队里有员工在Twitter一直实践Mesos,而且Twitter当时已经有非常多的服务跑在了Mesos上面。第三,作为一名工程师,有什么比尝试最炫酷的技术更让人激动人心呢?这个投资获得了客观的回报,数据在组织内部唾手可得,在需要数据为决策做支撑的时候,人们不再抓自己的后脑勺。我们还基于Mesos开发了一个任务调度系统叫做Chronos,利用这个系统,我们可以随意的创建一系列相互关联的计算任务,即便其中某个任务出错,它能够很智能的纠错以及报警。Mesos还提供了用户界面帮助我们排查某个出错的Job。要知道以前,我们还得靠非常原始的shell script去EMR的各个节点服务器上去抓取相关的日志,查错异常痛苦。Mesos上面不仅可以跑Hadoop,Kafka,而且许多内部服务都运行在它上面。最神奇的是,这些完全不同的服务,动态地运行在同一个集群里互不干扰。(想了解更多关于Mesos的信息,可以去瞅瞅Mesosphere这家公司)。对于不太会使用Hadoop工具的用户,我们还试着引入了AWS的Redshift,极大的提升了数据用户的工作效率。(具体详情可以参见我在Airbnb工程师blog上的这篇文章)。我们还很早地尝试了由伯克利AMP实验室开发的Spark/Shark,由于我们的数据科学家大多只有SQL的背景,加上当时系统的不成熟,只好在短暂使用以后很无奈的暂时放弃了(不过现在的spark和shark已经做的非常好了,做机器学习的朋友可以了解一下这个技术以及由伯克利的这个团队开创的Databricks这家公司)


    18 次点击  ∙  1 人收藏  

    加入收藏  Tweet  Weibo  忽略主题  


    React 入门实例教程

    现在最热门的前端框架,毫无疑问是 React

    上周,基于 React 的 React Native 发布,结果一天之内,就获得了 5000 颗星,受瞩目程度可见一斑。

    React 起源于 Facebook 的内部项目,因为该公司对市场上所有 JavaScript MVC 框架,都不满意,就决定自己写一套,用来架设 Instagram 的网站。做出来以后,发现这套东西很好用,就在2013年5月开源了。

    由于 React 的设计思想极其独特,属于革命性创新,性能出众,代码逻辑却非常简单。所以,越来越多的人开始关注和使用,认为它可能是将来 Web 开发的主流工具。

    这个项目本身也越滚越大,从最早的UI引擎变成了一整套前后端通吃的 Web App 解决方案。衍生的 React Native 项目,目标更是宏伟,希望用写 Web App 的方式去写 Native App。如果能够实现,整个互联网行业都会被颠覆,因为同一组人只需要写一次 UI ,就能同时运行在服务器、浏览器和手机(参见《也许,DOM 不是答案》)。

    既然 React 这么热门,看上去充满希望,当然应该好好学一下。从技术角度,可以满足好奇心,提高技术水平;从职业角度,有利于求职和晋升,有利于参与潜力大的项目。但是,好的 React 教程却不容易找到,这一方面因为这项技术太新,刚刚开始走红,大家都没有经验,还在摸索之中;另一方面因为 React 本身还在不断变动,API 一直在调整,至今没发布1.0版。

    我学习 React 时,就很苦恼。有的教程讨论一些细节问题,对入门没帮助;有的教程写得不错,但比较短,无助于看清全貌。我断断续续学了几个月,看过二十几篇教程,在这个过程中,将对自己有帮助的 Demo 都收集下来,做成了一个库 React Demos

    下面,我就根据这个库,写一篇全面又易懂的 React 入门教程。你只需要跟着每一个 Demo 做一遍,就能初步掌握 React 。当然,前提是你必须拥有基本 JavaScript 和 DOM 知识,但是你读完就会发现,React 所要求的预备知识真的很少。


    React 的安装包,可以到官网下载。不过,React Demos 已经自带 React 源码,不用另外安装,只需把这个库拷贝到你的硬盘就行了。

    $ git clone git@github.com:ruanyf/react-demos.git

    如果你没安装 git, 那就直接下载 zip 压缩包

    下面要讲解的10个例子在各个 Demo 子目录,每个目录都有一个 index.html 文件,在浏览器打开这个文件(大多数情况下双击即可),就能立刻看到效果。

    需要说明的是,React 可以在浏览器运行,也可以在服务器运行,但是本教程只涉及浏览器。一方面是为了尽量保持简单,另一方面 React 的语法是一致的,服务器的用法与浏览器差别不大。Demo11 是服务器首屏渲染的例子,有兴趣的朋友可以自己去看源码。

    一、HTML 模板

    使用 React 的网页源码,结构大致如下。

    <!DOCTYPE html>
        <script src="../build/react.js"></script>
        <script src="../build/JSXTransformer.js"></script>
        <div id="example"></div>
        <script type="text/jsx">
          // ** Our code goes here! **

    上面代码有两个地方需要注意。首先,最后一个 script 标签的 type 属性为 text/jsx 。这是因为 React 独有的 JSX 语法,跟 JavaScript 不兼容。凡是使用 JSX 的地方,都要加上 type="text/jsx" 。

    其次,React 提供两个库: react.js 和 JSXTransformer.js ,它们必须首先加载。其中,JSXTransformer.js 的作用是将 JSX 语法转为 JavaScript 语法。这一步很消耗时间,实际上线的时候,应该将它放到服务器完成。

    $ jsx src/ build/

    上面命令可以将 src 子目录的 js 文件进行语法转换,转码后的文件全部放在 build 子目录。


    React.render 是 React 的最基本方法,用于将模板转为 HTML 语言,并插入指定的 DOM 节点。

      <h1>Hello, world!</h1>,

    上面代码将一个 h1 标题,插入 example 节点(查看 demo01),运行结果如下。

    三、JSX 语法

    上一节的代码, HTML 语言直接写在 JavaScript 语言之中,不加任何引号,这就是 JSX 的语法,它允许 HTML 与 JavaScript 的混写(查看 Demo02 )。

    var names = ['Alice', 'Emily', 'Kate'];
        names.map(function (name) {
          return <div>Hello, {name}!</div>

    上面代码体现了 JSX 的基本语法规则:遇到 HTML 标签(以 < 开头),就用 HTML 规则解析;遇到代码块(以 { 开头),就用 JavaScript 规则解析。上面代码的运行结果如下。

    JSX 允许直接在模板插入 JavaScript 变量。如果这个变量是一个数组,则会展开这个数组的所有成员(查看 demo03 )。

    var arr = [
      <h1>Hello world!</h1>,
      <h2>React is awesome</h2>,

    上面代码的arr变量是一个数组,结果 JSX 会把它的所有成员,添加到模板,运行结果如下。


    React 允许将代码封装成组件(component),然后像插入普通 HTML 标签一样,在网页中插入这个组件。React.createClass 方法就用于生成一个组件类(查看 demo04)。

    var HelloMessage = React.createClass({
      render: function() {
        return <h1>Hello {this.props.name}</h1>;
      <HelloMessage name="John" />,

    上面代码中,变量 HelloMessage 就是一个组件类。模板插入 <HelloMessage /> 时,会自动生成 HelloMessage 的一个实例(下文的"组件"都指组件类的实例)。所有组件类都必须有自己的 render 方法,用于输出组件。

    组件的用法与原生的 HTML 标签完全一致,可以任意加入属性,比如 <HelloMessage name="John" /> ,就是 HelloMessage 组件加入一个 name 属性,值为 John。组件的属性可以在组件类的 this.props 对象上获取,比如 name 属性就可以通过 this.props.name 读取。上面代码的运行结果如下。

    添加组件属性,有一个地方需要注意,就是 class 属性需要写成 className ,for 属性需要写成 htmlFor ,这是因为 class 和 for 是 JavaScript 的保留字。


    this.props 对象的属性与组件的属性一一对应,但是有一个例外,就是 this.props.children 属性。它表示组件的所有子节点(查看 demo05)。

    var NotesList = React.createClass({
      render: function() {
        return (
            this.props.children.map(function (child) {
              return <li>{child}</li>

    上面代码的 NoteList 组件有两个 span 子节点,它们都可以通过 this.props.children 读取,运行结果如下。

    这里需要注意,只有当子节点多余1个时,this.props.children 才是一个数组,否则是不能用 map 方法的, 会报错。


    组件并不是真实的 DOM 节点,而是存在于内存之中的一种数据结构,叫做虚拟 DOM (virtual DOM)。只有当它插入文档以后,才会变成真实的 DOM 。根据 React 的设计,所有的 DOM 变动,都先在虚拟 DOM 上发生,然后再将实际发生变动的部分,反映在真实 DOM上,这种算法叫做 DOM diff ,它可以极大提高网页的性能表现。

    但是,有时需要从组件获取真实 DOM 的节点,这时就要用到 React.findDOMNode 方法(查看 demo06 )。

    var MyComponent = React.createClass({
      handleClick: function() {
      render: function() {
        return (
            <input type="text" ref="myTextInput" />
            <input type="button" value="Focus the text input" onClick={this.handleClick} />
      <MyComponent />,

    上面代码中,组件 MyComponent 的子节点有一个文本输入框,用于获取用户的输入。这时就必须获取真实的 DOM 节点,虚拟 DOM 是拿不到用户输入的。为了做到这一点,文本输入框必须有一个 ref 属性,然后 this.refs.[refName] 就指向这个虚拟 DOM 的子节点,最后通过 React.findDOMNode 方法获取真实 DOM 的节点。

    需要注意的是,由于 React.findDOMNode 方法获取的是真实 DOM ,所以必须等到虚拟 DOM 插入文档以后,才能使用这个方法,否则会返回 null 。上面代码中,通过为组件指定 Click 事件的回调函数,确保了只有等到真实 DOM 发生 Click 事件之后,才会调用 React.findDOMNode 方法。

    React 组件支持很多事件,除了 Click 事件以外,还有 KeyDown 、Copy、Scroll 等,完整的事件清单请查看官方文档


    组件免不了要与用户互动,React 的一大创新,就是将组件看成是一个状态机,一开始有一个初始状态,然后用户互动,导致状态变化,从而触发重新渲染 UI (查看 demo07 )。

    var LikeButton = React.createClass({
      getInitialState: function() {
        return {liked: false};
      handleClick: function(event) {
        this.setState({liked: !this.state.liked});
      render: function() {
        var text = this.state.liked ? 'like' : 'haven\'t liked';
        return (
          <p onClick={this.handleClick}>
            You {text} this. Click to toggle.
      <LikeButton />,

    上面代码是一个 LikeButton 组件,它的 getInitialState 方法用于定义初始状态,也就是一个对象,这个对象可以通过 this.state 属性读取。当用户点击组件,导致状态变化,this.setState 方法就修改状态值,每次修改以后,自动调用 this.render 方法,再次渲染组件。

    由于 this.props 和 this.state 都用于描述组件的特性,可能会产生混淆。一个简单的区分方法是,this.props 表示那些一旦定义,就不再改变的特性,而 this.state 是会随着用户互动而产生变化的特性。


    用户在表单填入的内容,属于用户跟组件的互动,所以不能用 this.props 读取(查看 demo08 )。

    var Input = React.createClass({
      getInitialState: function() {
        return {value: 'Hello!'};
      handleChange: function(event) {
        this.setState({value: event.target.value});
      render: function () {
        var value = this.state.value;
        return (
            <input type="text" value={value} onChange={this.handleChange} />
    React.render(<Input/>, document.body);

    上面代码中,文本输入框的值,不能用 this.props.value 读取,而要定义一个 onChange 事件的回调函数,通过 event.target.value 读取用户输入的值。textarea 元素、select元素、radio元素都属于这种情况,更多介绍请参考官方文档



    • Mounting:已插入真实 DOM
    • Updating:正在被重新渲染
    • Unmounting:已移出真实 DOM

    React 为每个状态都提供了两种处理函数,will 函数在进入状态之前调用,did 函数在进入状态之后调用,三种状态共计五种处理函数。

    • componentWillMount()
    • componentDidMount()
    • componentWillUpdate(object nextProps, object nextState)
    • componentDidUpdate(object prevProps, object prevState)
    • componentWillUnmount()

    此外,React 还提供两种特殊状态的处理函数。

    • componentWillReceiveProps(object nextProps):已加载组件收到新的参数时调用
    • shouldComponentUpdate(object nextProps, object nextState):组件判断是否重新渲染时调用

    这些方法的详细说明,可以参考官方文档。下面是一个例子(查看 demo09 )。

    var Hello = React.createClass({
      getInitialState: function () {
        return {
          opacity: 1.0
      componentDidMount: function () {
        this.timer = setInterval(function () {
          var opacity = this.state.opacity;
          opacity -= .05;
          if (opacity < 0.1) {
            opacity = 1.0;
            opacity: opacity
        }.bind(this), 100);
      render: function () {
        return (
          <div style={{opacity: this.state.opacity}}>
            Hello {this.props.name}
      <Hello name="world"/>,

    上面代码在hello组件加载以后,通过 componentDidMount 方法设置一个定时器,每隔100毫秒,就重新设置组件的透明度,从而引发重新渲染。




    style={{opacity: this.state.opacity}}

    这是因为 React 组件样式是一个对象,所以第一重大括号表示这是 JavaScript 语法,第二重大括号表示样式对象。


    组件的数据来源,通常是通过 Ajax 请求从服务器获取,可以使用 componentDidMount 方法设置 Ajax 请求,等到请求成功,再用 this.setState 方法重新渲染 UI (查看 demo10 )。

    var UserGist = React.createClass({
      getInitialState: function() {
        return {
          username: '',
          lastGistUrl: ''
      componentDidMount: function() {
        $.get(this.props.source, function(result) {
          var lastGist = result[0];
          if (this.isMounted()) {
              username: lastGist.owner.login,
              lastGistUrl: lastGist.html_url
      render: function() {
        return (
            {this.state.username}'s last gist is
            <a href={this.state.lastGistUrl}>here</a>.
      <UserGist source="https://api.github.com/users/octocat/gists" />,

    上面代码使用 jQuery 完成 Ajax 请求,这是为了便于说明。React 没有任何依赖,完全可以使用其他库。


    1. React’s official site
    2. React’s official examples
    3. React (Virtual) DOM Terminology, by Sebastian Markbåge
    4. The React Quick Start Guide, by Jack Callister
    5. Learning React.js: Getting Started and Concepts, by Ken Wheeler
    6. Getting started with React, by Ryan Clark
    7. React JS Tutorial and Guide to the Gotchas, by Justin Deal
    8. React Primer, by Binary Muse
    9. jQuery versus React.js thinking, by zigomir



    [广告] GreenVPN是一款运营多年的绿色、稳定、安全的VPN产品,帮助您轻松浏览Google、‍Facebook、Youtube、Twitter等,使用简单,手机电脑全平台支持,可免费试用,值得推荐。


    尾调用(Tail Call)是函数式编程的一个重要概念,本文介绍它的含义和用法。



    function f(x){
      return g(x);



    // 情况一
    function f(x){
      let y = g(x);
      return y;
    // 情况二
    function f(x){
      return g(x) + 1;



    function f(x) {
      if (x > 0) {
        return m(x)
      return n(x);




    我们知道,函数调用会在内存形成一个"调用记录",又称"调用帧"(call frame),保存调用位置和内部变量等信息。如果在函数A的内部调用函数B,那么在A的调用记录上方,还会形成一个B的调用记录。等到B运行结束,将结果返回到A,B的调用记录才会消失。如果函数B内部还调用函数C,那就还有一个C的调用记录栈,以此类推。所有的调用记录,就形成一个"调用栈"(call stack)。


    function f() {
      let m = 1;
      let n = 2;
      return g(m + n);
    // 等同于
    function f() {
      return g(3);
    // 等同于

    上面代码中,如果函数g不是尾调用,函数f就需要保存内部变量m和n的值、g的调用位置等信息。但由于调用g之后,函数f就结束了,所以执行到最后一步,完全可以删除 f() 的调用记录,只保留 g(3) 的调用记录。

    这就叫做"尾调用优化"(Tail call optimization),即只保留内层函数的调用记录。如果所有函数都是尾调用,那么完全可以做到每次执行时,调用记录只有一项,这将大大节省内存。这就是"尾调用优化"的意义。



    递归非常耗费内存,因为需要同时保存成千上百个调用记录,很容易发生"栈溢出"错误(stack overflow)。但对于尾递归来说,由于只存在一个调用记录,所以永远不会发生"栈溢出"错误。

    function factorial(n) {
      if (n === 1) return 1;
      return n * factorial(n - 1);
    factorial(5) // 120

    上面代码是一个阶乘函数,计算n的阶乘,最多需要保存n个调用记录,复杂度 O(n) 。

    如果改写成尾递归,只保留一个调用记录,复杂度 O(1) 。

    function factorial(n, total) {
      if (n === 1) return total;
      return factorial(n - 1, n * total);
    factorial(5, 1) // 120

    由此可见,"尾调用优化"对递归操作意义重大,所以一些函数式编程语言将其写入了语言规格。ES6也是如此,第一次明确规定,所有 ECMAScript 的实现,都必须部署"尾调用优化"。这就是说,在 ES6 中,只要使用尾递归,就不会发生栈溢出,相对节省内存。


    尾递归的实现,往往需要改写递归函数,确保最后一步只调用自身。做到这一点的方法,就是把所有用到的内部变量改写成函数的参数。比如上面的例子,阶乘函数 factorial 需要用到一个中间变量 total ,那就把这个中间变量改写成函数的参数。这样做的缺点就是不太直观,第一眼很难看出来,为什么计算5的阶乘,需要传入两个参数5和1?


    function tailFactorial(n, total) {
      if (n === 1) return total;
      return tailFactorial(n - 1, n * total);
    function factorial(n) {
      return tailFactorial(n, 1);
    factorial(5) // 120

    上面代码通过一个正常形式的阶乘函数 factorial ,调用尾递归函数 tailFactorial ,看起来就正常多了。


    function currying(fn, n) {
      return function (m) {
        return fn.call(this, m, n);
    function tailFactorial(n, total) {
      if (n === 1) return total;
      return tailFactorial(n - 1, n * total);
    const factorial = currying(tailFactorial, 1);
    factorial(5) // 120

    上面代码通过柯里化,将尾递归函数 tailFactorial 变为只接受1个参数的 factorial 。


    function factorial(n, total = 1) {
      if (n === 1) return total;
      return factorial(n - 1, n * total);
    factorial(5) // 120

    上面代码中,参数 total 有默认值1,所以调用时不用提供这个值。


    ([说明] 本文摘自我写的《ECMAScript 6入门》




    [广告] GreenVPN是一款运营多年的绿色、稳定、安全的VPN产品,帮助您轻松浏览Google、‍Facebook、Youtube、Twitter等,使用简单,手机电脑全平台支持,可免费试用,值得推荐。

    Technical Q&A QA1176: Getting a Packet Trace

    Technical Q&A QA1176

    Getting a Packet Trace

    I'm trying to debug a network problem. How do I get a packet trace?

    A: This depends on your platform:

    • There are a number of programs for OS X that let you gather and analyze packet traces. See OS X Programs for details.

    • iOS does not support packet tracing directly. However, if you connect your iOS device to a Mac via USB, you can use an OS X packet trace program to gather and analyze traces using the remote virtual interface feature. See iOS Packet Tracing for details.

    Finally, Packet Trace Notes offers some hints and tips that you might find useful when dealing with packet traces.

    Important: Mention of third party sites and third party products is for informational purposes only and constitutes neither an endorsement nor a recommendation. Apple assumes no responsibility with regard to the selection, performance, or use of these vendors or products.

    OS X Programs

    OS X supports a wide range of packet trace programs, as described in the following sections.

    Charles HTTP Proxy

    Charles is an HTTP proxy that allows you to view all of the HTTP and HTTPS traffic between your machine and the Internet.


    Cocoa Packet Analyzer is a native OS X implementation of a network protocol analyzer and packet sniffer.


    Debookee is a OS X application which allows you to see what your devices are sending over the network.


    FrameSeer is an inexpensive network packet capture application for OS X.

    HTTP Scoop

    HTTP Scoop is an HTTP protocol analyzer for OS X. It reconstructs complete HTTP conversations (rather than just showing the packets that make them up) and presents them in a user-friendly manner.


    IPNetMonitorX is a network troubleshooting toolkit for debugging Internet service problems and optimizing performance.


    This command line tool is built in to all versions of OS X, and is also available on many other Unix platforms. For a quick summary of how to use tcpdump, see Getting Started With tcpdump.


    If you're debugging a high-level protocol, it's nice to see the various TCP connections as streams of data rather than individual packets. The tcpflow tool can do that for you. If you've not used tcpflow before, there's a quick introduction in Getting Started With tcpflow.

    The tcpflow tool is not built-in to OS X, but you can get it in a variety of ways.

    • Dowload and build the source code from its home site.

    • Install it via a 'ports' project, like MacPorts or Fink.

    • Dowload a pre-built binary from here.


    tcptrace is an open source tool for analyzing the TCP connections in a packet trace.


    Wireshark is an open source packet analyzer that has been ported to OS X. It requires X11.

    Wireless Diagnostics

    Wireless Diagnostics is an application built in to OS X that lets you capture a Wi-Fi level packet trace. Such traces contain more information than a standard packet trace (for example, they show Wi-Fi's link-layer retransmissions).

    You can find Wireless Diagnostics in the /System/Library/CoreServices directory; on later systems it might be in the Applications subdirectory within that directory. On OS X 10.7 the application was called Wi-Fi Diagnostics.

    See Wi-Fi Capture for more information about using this tool.

    Back to Top

    iOS Packet Tracing

    iOS does not support packet tracing directly. However, if you're developing for iOS you can take a packet trace of your app in a number of different ways:

    • If the problem you're trying to debug occurs on Wi-Fi, you can put your iOS device on a test Wi-Fi network. See Wi-Fi Capture for details.

    • If your app uses HTTP, you can configure your iOS device to use a debugging HTTP proxy (such as Charles HTTP Proxy).

    • In iOS 5 and later you can use the remote virtual interface facility.

    Remote Virtual Interface

    iOS 5 added a remote virtual interface (RVI) facility that lets you use OS X packet trace programs to capture traces from an iOS device. The basic strategy is:

    1. Connect your iOS device to your Mac via USB.

    2. Set up an RVI for that device. This creates a virtual network interface on your Mac that represents the iOS device's networking stack.

    3. Run your OS X packet trace program, and point it at the RVI created in the previous step.

    To set up an RVI, you should run the rvictl tool as shown below.

    $ # First get the current list of interfaces.
    $ ifconfig -l
    lo0 gif0 stf0 en0 en1 p2p0 fw0 ppp0 utun0
    $ # Then run the tool with the UDID of the device.
    $ rvictl -s 74bd53c647548234ddcef0ee3abee616005051ed
    Starting device 74bd53c647548234ddcef0ee3abee616005051ed [SUCCEEDED]
    $ # Get the list of interfaces again, and you can see the new virtual
    $ # network interface, rvi0, added by the previous command.
    $ ifconfig -l
    lo0 gif0 stf0 en0 en1 p2p0 fw0 ppp0 utun0 rvi0

    Now that you know the name of the RVI, you can point your packet trace tool at it. For example, here's how you might run tcpdump to take a packet trace from the RVI and write it to the file trace.pcap.

    $ sudo tcpdump -i rvi0 -w trace.pcap
    tcpdump: WARNING: rvi0: That device doesn't support promiscuous mode
    (BIOCPROMISC: Operation not supported on socket)
    tcpdump: WARNING: rvi0: no IPv4 address assigned
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on rvi0, link-type RAW (Raw IP), capture size 65535 bytes

    When you're done you can remove the RVI with the following command.

    $ rvictl -x 74bd53c647548234ddcef0ee3abee616005051ed
    Stopping device 74bd53c647548234ddcef0ee3abee616005051ed [SUCCEEDED]

    Important: The RVI represents the entire networking stack of the iOS device; there's no way to trace a specific interface on the device, or even learn which packets were transferred on which interface.

    Remote Virtual Interface Troubleshooting

    This section explains how to resolve some common issues with RVI.

    If your Mac doesn't have the rvictl tool, make sure you install Xcode 4.2 or later.

    If the device is running iOS 7 or later, you must use the RVI support installed by Xcode 5.0 or later.

    The RVI support installed by Xcode 5.0 works best on OS X 10.9 and later. Specifically, if you run tcpdump on 10.8.x and see the message "unknown ip 0", you will need to update to 10.9 to access those packets via RVI.

    If rvictl fails with the message:

    bootstrap_look_up(): 1102

    make sure that that the com.apple.rpmuxd launchd job is loaded correctly. The following command should print information about the job.

    $ sudo launchctl list com.apple.rpmuxd
        "Label" = "com.apple.rpmuxd";

    If it fails, it could be because the job is unloaded. You can force it to load with the following command.

    $ sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.rpmuxd.plist

    Back to Top

    Packet Trace Notes

    Getting Started With tcpdump

    To get started with tcpdump, try the following command.

    sudo tcpdump -i en0 -w trace.pcap

    The elements of this command line are:

    • The sudo command causes tcpdump to run with privileges, which is necessary in order to capture network traffic.

    • The -i en0 option tells tcpdump to capture packets on the first Ethernet interface. By default, tcpdump will use the first non-loopback interface it can find (usually en0). To specify a different interface, just change en0 to the BSD name of that interface. For example, the AirPort interface is typically en1.

      To get a list of network interfaces and their user-visible names, run the networksetup tool with the -listallhardwareports argument.

    • The -w trace.pcap parameter tells tcpdump to write the packets to a file called trace.pcap.

    Note: If you're running on a system prior to OS X 10.7 you should also supply the -B 524288 option, which increases the packet capture buffer size to 512 KiB. This is important to avoid dropped packets on high-speed networks. See Dropped Packets for more information about this. This larger buffer size is the default on OS X 10.7 and later.

    If you're running on a system prior to OS X 10.6 you should also supply the -s 0 option, which tells tcpdump to capture the full packet rather than just the first 68 bytes. This option is the default on OS X 10.6 and later.

    In response to this command, tcpdump will begin to capture packets and put them in the trace.pcap file. When you want to stop capturing, interrupt tcpdump by typing ^C. You can then display the contents of the packets as text using the following command.

    tcpdump -n -e -x -vvv -r trace.pcap

    New elements of the command line are:

    • The -n option means that addresses are not converted to domain names, which speeds things up considerably.

    • The -e option causes tcpdump to display the link-level header for each packet.

    • The -x option causes the contents of the packet to also be displayed in hex.

    • The -vvv option makes tcpdump's output as verbose as possible.

    • By specifying -r trace.pcap option you tell tcpdump to read packets from the file trace.pcap rather than from a network interface. Note that you don't need privileges to do this, so running tcpdump using sudo is not required.

    You can also combine these steps, as shown below, but if you do this you don't get a high-fidelity record of the packets that you captured.

    sudo tcpdump -i en0 -n -e -x -vvv

    You can learn about tcpdump from the online manual and from the book TCP/IP Illustrated, Volume 1: The Protocols, W. Richard Stevens, Addison-Wesley, 1994, ISBN 0-201-63346-9. That book is also an excellent introduction to TCP/IP protocols in general.

    Getting Started With tcpflow

    The tcpflow command makes it much easier to debug high-level protocols. For example, if you're debugging an HTTP client, you can run the following command.

    sudo tcpflow -i en0 port 80

    tcpflow will create a bunch of files in the current directory, each of which contains the reassembled contents of a single TCP stream. So, if you run tcpflow as shown above and then fetch the URL http://apple.com, you can see how the HTTP redirect works.

    $ sudo tcpflow -i en0 port 80
    tcpflow[953]: listening on en0
    tcpflow[953]: terminating
    $ ls -lh
    total 16
    -rw-r--r-- 1 root  quinn [...]
    -rw-r--r-- 1 root  quinn [...]
    $ # This is the request.
    $ cat
    GET / HTTP/1.1
    User-Agent: curl/7.19.4 (universal-apple-darwin10.0) libcurl/7.19.4 OpenSSL/0.9.8k zlib/1.2.3
    Host: apple.com
    Accept: */*
    $ # And this is the response.
    $ cat
    HTTP/1.1 302 Object Moved
    Location: http://www.apple.com/
    Content-Type: text/html
    Cache-Control: private
    Connection: close
    <head><body> This object may be found <a HREF="http://www.apple.com/">here</a> </body>

    Loopback Issues

    Important: You should consult the documentation that comes with your program for accurate and up-to-date information about its limitations.

    Some packet trace programs have problems with packets being transferred to or from the trace machine (the machine running the packet trace program). To avoid these problems, separate your trace machine from the machines whose network traffic you're tracing.

    As an example of this, on OS X tcpdump may display the TCP checksum of packets sent by the trace machine as bad. This is because of TCP checksum offloading; packets sent by the trace machine are captured before being handed to the network adapter, so they don't have the TCP checksum set correctly. This is not a fatal problem; if the bad checksums bother you, you can turn off the check by passing the -K option to tcpdump.

    Dropped Packets

    If you capture all the bytes of each packet, it's very easy to overrun the kernel's packet capture buffer. The symptoms of this overrun are that your packet trace program will report that it dropped packets.

    In the case of tcpdump, it prints a summary of how many packets were captured, filtered, and dropped when you stop the capture. For example:

    $ sudo tcpdump -i en0 -w trace.pcap
    tcpdump: listening on en0, link-type EN10MB (Ethernet), capture size 65535 bytes
    94 packets captured
    177 packets received by filter
    0 packets dropped by kernel

    If the dropped count is non-zero, you need to increase the packet capture buffer size by passing the -B option to tcpdump, as discussed earlier.

    Switches And Hubs

    If you use a separate trace machine, you have to make sure that the trace machine can see the packets of interest. There are two ways to do this:

    • Use a hub rather than a switch — These days it is hard to find real hubs. Most 10/100 hubs are actually switches in disguise. However, it is possible to find a 10/100 hub that only switches between the different speed segments (for example, the SMC-EZ58xxDS range).

    • Enable port mirroring — On most advanced switches it is possible to configure the switch so that all traffic is mirrored to a specific port. To learn more about this, consult the documentation for your switch.

    Capture Hints From The Wireshark Wiki

    The Wireshark wiki has some really useful information about how to setup your packet tracing environment.

    Wi-Fi Capture

    Capturing packets on Wi-Fi can be tricky because conversations between one Wi-Fi client and the access point are not necessarily visible to other Wi-Fi clients. There are two easy ways to ensure that you see the relevant Wi-Fi traffic:

    • bridge mode — If your Wi-Fi access point supports bridge mode (for example, all Apple base stations do), you can bridge the Wi-Fi on to an Ethernet and then use standard Ethernet techniques to capture a packet trace. You may not be able to see Wi-Fi to Wi-Fi traffic, but in many situations that's not a problem.

    • Internet Sharing — If you enable Internet Sharing on your Mac, and have your Wi-Fi clients join the shared network, you can run your packet trace program on the Mac and see all the Wi-Fi traffic. If you target the Mac's Wi-Fi interface, you will see all traffic including Wi-Fi to Wi-Fi traffic. If you target the Ethernet interface, you will only see traffic entering or leaving the Wi-Fi network.

    Alternatively, you can use the Wireless Diagnostics application to take a Wi-Fi level packet trace. This shows all traffic visible to your Mac, including low-level Wi-Fi traffic that's not visible with other tools. When using this tool, keep in mind the following:

    • After running the application, you can access the packet trace feature by choosing Utilities from the Window menu and then selecting the Frame Capture tab.

    • Your Mac can't use the Wi-Fi interface for normal network traffic while tracing.

    • You must choose a channel to trace on. It simplifies things if you configure your access point to use a specific channel rather than let it choose one automatically.

    • If the Wi-Fi network has a password, Wi-Fi encryption will make it much harder to examine the trace. To get around this, either temporarily turn off the Wi-Fi password on your network or use a separate test network that has no password.

    Submitting A Trace With A Bug Report

    If you're submitting a bug report related to networking, you should consider submitting a packet trace along with your bug report. The Bug Reporting page has details on how to provide additional information with your bug report.

    Submitting A Trace To DTS

    If you send a packet trace to DTS, please include the following:

    • The system type and OS version of the trace machine.

    • The name and version of the program you used to capture the packet trace.

    • If you've used a program whose native file format is the libpcap file format (these include tcpdump, Wireshark, and various others), you can send us the packet trace file in that format. Otherwise, please include a copy of the packet trace in both its native format and, if that native format isn't text, a text export of the trace as well. That way we're guaranteed to be able to read your packet trace.

    • For each relevant machine shown in the trace, please describe the following:

      • the machine's role in the network conversation

      • the system type and OS version

      • the machine's IP address

      • the machine's hardware address (also known as the Ethernet address or MAC address)

    Back to Top

    Document Revision History

    Date Notes

    Changed the RVI example to save the packet trace to disk, which is generally the most useful option (r. 15449523).


    Added a reference to the Wireless Diagnostics application (r. 14454725). Added the "Submitting A Trace With A Bug Report" section. Added a section on RVI troubleshooting.


    Added references to tcptrace (r. 11809312) and Debookee (r. 12929510). Simplified the example tcpdump command by removing options that are applied by default on recent systems. Made minor editorial changes.


    Documented iOS 5's remote virtual interface feature (r. 10194867), and reworked the document structure to account for that change. Added a section on Wi-Fi packet tracing.


    Added reference to the Charles HTTP proxy (r. 9053410).


    Added sections on tcpflow and HTTP Scoop. Added information about dropped packets and how to avoid them. Various minor editorial changes.


    Streamlined product descriptions and reorganized the Miscellaneous notes. Restructured entire document and added additional packages. Added additional Hub/Switch information.


    Added a reference to FrameSeer. Converted text to TNT and fixed formatting and link problems.


    New document that
    lists tools available for looking at the network packets on the wire, plus some helpful hints on how to use those tools.

    Copyright © 2014 Apple Inc. All Rights Reserved. Terms of Use | Privacy Policy | Updated: 2014-08-28