Network Analysis using Tstat and Awk Script
- FA
- Jan 28, 2015
The primary goal of the initial design of Tstat was to automate the gathering of TCP statistics for traffic aggregation by utilizing real-time monitoring capabilities. However, as time progressed, Tstat underwent significant development and transformation into a more sophisticated tool, providing extensive statistics and functionalities. It was developed in ANSI C, prioritizing efficiency, and has now become an Open Source tool. Tstat enables advanced analysis of multi-Gigabit per second traffic in real-time using commonly available hardware. Notably, the design of Tstat is remarkably adaptable, featuring various plug-in modules that offer diverse capabilities.
AWK is a scripting language used for text processing and data manipulation. It operates on structured data, typically in files, by applying rules to patterns. An AWK script consists of rules that specify patterns and actions. It reads input line by line, matches patterns, and performs associated actions. AWK has built-in features like pattern matching, string manipulation, arithmetic operations, variables, arrays, and control structures. It supports regular expressions and provides useful variables like $0 (entire line) and $1, $2 (fields/columns). AWK is widely used for tasks like data extraction, formatting, reporting, and analysis due to its simplicity and efficiency in handling large datasets.
{if(NR!=1){if ($99==1&&$101==1&&$115~/facebook.com/) {#histogram of server IP Address
hist[$45]++
#FQDN
fqdn[$45]=$115
##total count
count++}
}}
END {
{print “Server_IP_Address” ,”Number_of_Connections”, “Fraction”,”Percentage”, “FQDN” } for (x in hist)
{print x, hist[x], hist[x]/count, hist[x]/count*100, fqdn[x]}
}
The command needed to achieve the objective is:
farhan@ubuntu:~/Desktop$ zcat log_tcp_complete.gz | awk -f 4_5.awk | column -t > 4_5.txt
The plot using gnuplot is shown below:
As above, but sort the results in increasing number of flows and plot the result using a histogram.
farhan@ubuntu:~/Desktop$ zcat log_tcp_complete.gz | awk -f 4_5.awk | column –t | sort –n –k2 > 4_5.txt
{if(NR!=1){fqdn[$115]++}}
END {
for (x in fqdn)
{print x, fqdn[x]}
}
farhan@ubuntu:~/Desktop$ zcat log_tcp_complete.gz | awk -f 5_2.awk | column -t | sort -n –
k2 | tail -n 10 ib.adnxs.com fb1.farm2.zynga.com star.c10r.facebook.com www.google.com fbstatic-a.akamaihd.net graph.facebook.com fbcdn-profile-a.akamaihd.net profile.ak.fbcdn.net www.facebook.com
–
11535
11744
13824
13837
14809
15832
20742
20851
21612
1040077
The awk script used here is:
{if(NR!=1){conntype[$101]++ count++}} END {
{
for (x in conntype)
{print x, conntype[x], conntype[x]/count} }
}
{ if(NR!=1) {
if($101==1){count++} if($101==131072||($101==256&&$102==9)){count1++} if($101==16384||($101==256&&($102==1||$102==3))){count2++} if((NR-1)%10000==0)
{
http[NR-1]=count-lastcount; bittorrent[NR-1]=count1-lastcount1;
} }
emule[NR-1]=count2-lastcount2;
lastcount=count
lastcount1=count1
lastcount2=count2
}
END{
http[NR-1]=count-lastcount; bittorrent[NR-1]=count1-lastcount1; emule[NR-1]=count2-lastcount2;
print “Time” , “HTTP_Fraction” , “Bittorrent_Fraction” , “eMule_Fraction” for (t in http)
{print t/10000,http[t]/10000,bittorrent[t]/10000,emule[t]/10000}
}
#t is the time
farhan@ubuntu:~/Desktop$ zcat log_tcp_complete.gz | awk -f 5_4.awk | sort -n -k1 > 5_4_results.txt
farhan@ubuntu:~/Desktop$ zcat log_tcp_complete.gz | head -n 1000000 | awk -f 5_7.awk |
column -t | sort -n -k2
number IP_address
1 74.125.209.22
1 74.125.213.246
1 74.125.214.180
1 74.125.214.208
1 74.125.215.81
1 74.125.216.112
1 74.125.218.181
1 173.194.2.13
2 208.117.245.164
2 208.117.245.230
1 208.65.155.18
29 208.65.154.142
names ,v7.cache1.c.youtube.com ,v7.cache8.c.youtube.com ,v5.cache6.c.youtube.com ,v1.cache7.c.youtube.com ,v2.cache3.c.youtube.com ,v1.cache4.c.youtube.com ,v6.cache6.c.youtube.com ,v17.lscache1.c.youtube.com ,tc.v23.cache4.c.youtube.com ,tc.v16.cache8.c.youtube.com ,tc.v11.cache1.c.youtube.com ,tc.v23.cache4.c.youtube.com