Internet Outage Detection

And My First Foray Into Julia

Author

Santiago Rodriguez

Published

March 25, 2023

Modified

November 7, 2024

Quarto

Quarto enables you to weave together content and executable code into a finished document. Click here to learn more about Quarto.

Background

In March of 2023 my household was suffering from intermittent internet issues. Naturally, I called my ISP and they claimed there were no issues on their end. They stated the issue was on my end, and since I use my own equipment, they could do nothing for me.

So, I created a bash script to ping a website n times and log the results to a text file. I used a spare computer to schedule a cron job to run every minute. My goal with the bash script was to assess if there were patterns in the outages and to test if any change I made resolved the intermittend internet issues. Unfortunately, my frustration with my ISP got the better of me and while I was setting up the cron job I changed the ethernet cable between the modem and router. The cron job was initiated after this change and as fate would have it, replacing the ethernet cable resolved our issues. While it didn’t take long to create the bash script and schedule the cron job, I didn’t want my efforts to be in vain so I left the computer on for about two weeks, just because.

At the same time as the internet drama was unfolding, I was learning Julia. At some point it occured to me that I could use the output of the bash script to practice what I had learned. It seems the programming gods found favor in my efforts because near the end of the two-week period another outage occured, but this time it was the ISP’s fault.

Without further ado…

Logistics

  • activate the project
  • import libs
  • define paths

Activate

Unhide
# activate project
# using Pkg
# Pkg.activate(@__DIR__)

In R, to obtain or define the project root directory I would use here::here() to get the project root directory, but here is a separate library. In Julia the @__DIR__ macro serves a similar purpose and is built into the language.

Initially, this Quarto file was in a directory called ./analyses and the @__DIR__ macro pointed to this directory, a child of the project root directory. I moved this file up a level into the project root dir so the @__DIR__ macro would work as intended - that is, reference to the project root directory.

I prefered to move the files because it prevents hardcoding a variable, such as project_root = "./home/user/project_root.

Another comment about Quarto and Julia. According to the virtual environments docs, Quarto doesn’t currenlty leverage virtual environments with Julia [yet], so using Pkg.activate(@__DIR__) or Pkg.activate(".") doesn’t do anything when rendering. This is a shame because package management is a selling point for Julia. Instead of using my project env I had to install the below libs in my root env - not ideal since many Julia libs are still young and actively changing.

Edit Oct 2024

Recently, that is sometime in 2024, Posit announced a new Julia-native engine for Quarto. This new engine now leverages the project envrironment!! Woohoo!!

Import

Unhide
# import project libs
using DataFrames
using DataFramesMeta
using Revise
using Statistics
using Dates
using TimeZones
using Plots

Define Paths

  • Using the @__DIR__ macro helps minimize the number of hardcoded strings in the project
Unhide
# define paths
path_data = joinpath(@__DIR__, "data")
path_data_output = joinpath(path_data, "output")

# define files
log_file = joinpath(path_data_output, "log.txt")

# used to hide details since julia prints last line
""
""

Import Data

Steps:

  1. import the text file created from the bash script running ping website -n
  2. loop through each line of the file
  • where the line contains the word “error” push that to an object called error
  • else push line to object called valid
  1. using broadcasting (i.e., vectorization) split each line of error and valid on “,”
Unhide
#=
if file exists then
    - import e/ line in the file
    - if the line contains errors add it to .. else add it to ..
    - split the strings via ","
=#

if(isfile(log_file))
    # logistics: create empty objects
    errors = []
    valid = []

    # open the file for reading
    file = open(log_file, "r")

    # read each line of the file
    for line in eachline(file)
        # check if the word "error" is present in the line
        if occursin("error", line)
            push!(errors, line)
        else
            push!(valid, line)
        end
    end

    # close the file
    close(file)

    # convert ... to split (vectorized)
    errors_split = split.(errors, ",")
    valid_split = split.(valid, ",")

    # used to hide details since julia prints last line
    ""
end
""

I split some of the lines into an errors object because I’m not sure why there are errors. As will be shown below when there is no internet, the command ping website -n in the bash script still runs and without “error”.

Visual Inspection

Let’s peer into the error and valid objects.

Unhide
errors_split[1:end]
1-element Vector{Vector{SubString{String}}}:
 ["Wed Mar 15 17:54:01 CDT 2023 ", " 10 packets transmitted", " 6 received", " +1 errors", " 40% packet loss", " time 9056ms"]
Unhide
valid_split[1:5]
5-element Vector{Vector{SubString{String}}}:
 ["Wed Mar 15 06:57:01 CDT 2023 ", " 10 packets transmitted", " 10 received", " 0% packet loss", " time 9012ms"]
 ["Wed Mar 15 06:58:01 CDT 2023 ", " 10 packets transmitted", " 10 received", " 0% packet loss", " time 9012ms"]
 ["Wed Mar 15 06:59:01 CDT 2023 ", " 10 packets transmitted", " 10 received", " 0% packet loss", " time 9013ms"]
 ["Wed Mar 15 07:00:01 CDT 2023 ", " 10 packets transmitted", " 10 received", " 0% packet loss", " time 9012ms"]
 ["Wed Mar 15 07:01:01 CDT 2023 ", " 10 packets transmitted", " 10 received", " 0% packet loss", " time 9012ms"]

Data Prep: Non-Error Data

Steps:

  1. initialize an empty matrix, filled of NA (missing in Julia)
  2. fill the matrix
    • for each element in each line within …, set the i,j element in the matrix to the corresponding value
  3. convert the matrix to a dataframe
  4. rename the column/fields/variables
  5. dataframe transformation

Initalize Matrix

Unhide
# initialize empty matrix
valid_matrix = Matrix{Union{Missing, String}}(
    # default value
    missing,
    # nrow
    length(valid_split),
    # ncol - derived from first element in ...
    # - assumes that the first element is complete
    length(valid_split[1])
)

# used to hide details since julia prints last line
""
""

Populate Matrix

Unhide
for i in eachindex(valid_split)
    for j in eachindex(valid_split[i])
        # remove leading, trailing whitespace
        valid_matrix[i, j] = strip(valid_split[i][j])
    end
end

I used a nested loop to populate the matrix because loops in Julia are fast!.

Visual Inspection

Let’s peer into the matrix.

Unhide
valid_matrix[1:5,:]
5×5 Matrix{Union{Missing, String}}:
 "Wed Mar 15 06:57:01 CDT 2023"  …  "0% packet loss"  "time 9012ms"
 "Wed Mar 15 06:58:01 CDT 2023"     "0% packet loss"  "time 9012ms"
 "Wed Mar 15 06:59:01 CDT 2023"     "0% packet loss"  "time 9013ms"
 "Wed Mar 15 07:00:01 CDT 2023"     "0% packet loss"  "time 9012ms"
 "Wed Mar 15 07:01:01 CDT 2023"     "0% packet loss"  "time 9012ms"

Convert to Dataframe

Unhide
# convert to data frame
valid_df = DataFrame(valid_matrix, :auto)

# used to hide details since julia prints last line
""
""

Visual Inspection

Let’s peer into the dataframe.

Unhide
valid_df[1:5,:]
5×5 DataFrame
Row x1 x2 x3 x4 x5
String? String? String? String? String?
1 Wed Mar 15 06:57:01 CDT 2023 10 packets transmitted 10 received 0% packet loss time 9012ms
2 Wed Mar 15 06:58:01 CDT 2023 10 packets transmitted 10 received 0% packet loss time 9012ms
3 Wed Mar 15 06:59:01 CDT 2023 10 packets transmitted 10 received 0% packet loss time 9013ms
4 Wed Mar 15 07:00:01 CDT 2023 10 packets transmitted 10 received 0% packet loss time 9012ms
5 Wed Mar 15 07:01:01 CDT 2023 10 packets transmitted 10 received 0% packet loss time 9012ms

Rename

Unhide
#=
data prep
- rename vars (in place) (hard coded)
=#
rename!(
    valid_df,
    :x1 => :datetime,
    :x2 => :transmitted,
    :x3 => :received,
    :x4 => :packet_loss,
    :x5 => :runtime_ms
)

# used to hide details since julia prints last line
""
""

The “!” after the function is a cool feature in Julia. The “!” after the function completes an action in-place. It’s equivalent to valid_df = rename(valid_df, ...).

Transformations

Unhide
#= transform:
- remove special chars; keep digits
=#
@chain valid_df begin
    @rtransform!(
        :transmitted = if !ismissing(:transmitted) replace(:transmitted, r"\D+" => "") end,
        :received = if !ismissing(:received) replace(:received, r"\D+" => "") end,
        :packet_loss = if !ismissing(:packet_loss) replace(:packet_loss, r"\D+" => "") end,
        :runtime_ms = if !ismissing(:runtime_ms) replace(:runtime_ms, r"\D+" => "") end,
    )
end

#= transform:
- convert "" to missing
=#
@chain valid_df begin
    @rtransform!(
        :transmitted = :transmitted == "" ? missing : :transmitted,
        :received = :received == "" ? missing : :received,
        :packet_loss = :packet_loss == "" ? missing : :packet_loss,
        :runtime_ms = :runtime_ms == "" ? missing : :runtime_ms,
    )
end

#= transform:
- convert CDT timezone to name found in tz lookup db
- convert datetime to datetime
    - Z from TimeZones package
=#
@chain valid_df begin
    @rtransform!(
        :datetime = replace(:datetime, "CDT" => "America/Chicago")
    )
    @rtransform!(
        :datetime = DateTime(:datetime, DateFormat("e u d H:M:S Z y"))
    )
end

#= transform:
- convert string to numeric
=#
@chain valid_df begin
    @rtransform!(
        :transmitted = if !ismissing(:transmitted) && !isnothing(:transmitted) parse(Int, :transmitted) end,
        :received = if !ismissing(:received) && !isnothing(:received) parse(Int, :received) end,
        :packet_loss = if !ismissing(:packet_loss) && !isnothing(:packet_loss) parse(Int, :packet_loss) end,
        :runtime_ms = if !ismissing(:runtime_ms) && !isnothing(:runtime_ms) parse(Int, :runtime_ms) end,
    )
end

#= transform:
- convert packet loss to decimal [0,1]
=#
@chain valid_df begin
    @rtransform!(
        :packet_loss = if !ismissing(:packet_loss) && !isnothing(:packet_loss)
            (:packet_loss/100)
            end
    )
end

#= transform: MUST GO LAST
- convert nothing to missing
=#
@chain valid_df begin
    @rtransform!(
        :transmitted = isnothing(:transmitted) ? missing : :transmitted,
        :received = isnothing(:received) ? missing : :received,
        :packet_loss = isnothing(:packet_loss) ? missing : :packet_loss,
        :runtime_ms = isnothing(:runtime_ms) ? missing : :runtime_ms,
    )
end

# used to hide details since julia prints last line
""
""

Unlike in R with dplyr a variable can’t be referenced more than once within a transform() call. That’s why there are multiple transform blocks above. However, it is possible to have several @transform macros in one @chain call.

Visual Inspection

Let’s peer into the dataframe.

Unhide
valid_df[1:5,:]
5×5 DataFrame
Row datetime transmitted received packet_loss runtime_ms
DateTime Int64? Int64? Float64? Int64?
1 2023-03-15T06:57:01 10 10 0.0 9012
2 2023-03-15T06:58:01 10 10 0.0 9012
3 2023-03-15T06:59:01 10 10 0.0 9013
4 2023-03-15T07:00:01 10 10 0.0 9012
5 2023-03-15T07:01:01 10 10 0.0 9012

EDA

Next is a quick summary of the data via describe(df).

Unhide
describe(valid_df)
5×7 DataFrame
Row variable mean min median max nmissing eltype
Symbol Union… Any Any Any Int64 Type
1 datetime 2023-03-15T06:57:01 2023-03-20T13:49:31 2023-03-28T11:09:01 0 DateTime
2 transmitted 10.0 10 10.0 10 587 Union{Missing, Int64}
3 received 9.99622 3 10.0 10 587 Union{Missing, Int64}
4 packet_loss 0.000377596 0.0 0.0 0.7 587 Union{Missing, Float64}
5 runtime_ms 9013.95 8998 9014.0 10178 587 Union{Missing, Int64}

Plots

Unhide
scatter(
    valid_df.datetime,
    valid_df.received,
    label =  "Received",
    ms = 2,
    mc = :red
)
plot!(
    valid_df.datetime,
    valid_df.transmitted,
    label = "Transmitted",
    linewidth = 3,
    lc = :blue
)
plot!(legend=:outerbottom, legendcolumns=2)
title!("Internet Analysis")
xlabel!("Datetime (minute interval)")
ylabel!("Packets")
ylims!(0, 11)

When there are no issues we expect 10 packets transmitted and 10 packets received - y = 10. The red dots below the blue line indicate something happened and fewer packets were received than were transmitted. There is a large gap between March 21 and March 24. Around 3/21 I turned off the computer since replacing the ethernet cable between the modem and the router seemed to fix our intermittent internet issues. However, on 3/24 at around 11 am there was an ISP internet outage in my area.

Summary:

  • y = 10 is normal and the blue line and red dots will overlap
  • red dots below the blue line indicate fewer packets were received than were transmitted
  • gaps in the blue line indicate no internet was available

Conclusion

From this work I gleaned that for about two weeks in March 2023 there were several instances that may indicate we had some connectivity issues. On March 24 for about 8-10 hours my household had no internet (i.e., 0 packets transmitted).

Using a bash script on a spare computer running a cron job every minute I created a log of my household’s internet health. I used Julia to prep and analyze the data and I used Quarto to prepare this document. I used GitLab as the git client - this helps collaborate between computers. SSH would also be useful to collaborate between computers but I didn’t set that up. I used VS Code as the IDE.

If I were so inclined, I could keep the computer on to track the health of internet. I could even explore ways to host and automate refreshing this report. Alas, I probably won’t do that since my goal with this project was to practice Julia and learn how to use Quarto with Julia on VS Code. Prior to this project, I had only used Quarto on RStudio with R and Python.

If you read this, thank you :)