Unhide
# activate project
# using Pkg
# Pkg.activate(@__DIR__)
And My First Foray Into Julia
Quarto enables you to weave together content and executable code into a finished document. Click here to learn more about Quarto.
In March of 2023 my household was suffering from intermittent internet issues. Naturally, I called my ISP and they claimed there were no issues on their end. They stated the issue was on my end, and since I use my own equipment, they could do nothing for me.
So, I created a bash script to ping a website n times and log the results to a text file. I used a spare computer to schedule a cron job to run every minute. My goal with the bash script was to assess if there were patterns in the outages and to test if any change I made resolved the intermittend internet issues. Unfortunately, my frustration with my ISP got the better of me and while I was setting up the cron job I changed the ethernet cable between the modem and router. The cron job was initiated after this change and as fate would have it, replacing the ethernet cable resolved our issues. While it didn’t take long to create the bash script and schedule the cron job, I didn’t want my efforts to be in vain so I left the computer on for about two weeks, just because.
At the same time as the internet drama was unfolding, I was learning Julia. At some point it occured to me that I could use the output of the bash script to practice what I had learned. It seems the programming gods found favor in my efforts because near the end of the two-week period another outage occured, but this time it was the ISP’s fault.
Without further ado…
# activate project
# using Pkg
# Pkg.activate(@__DIR__)
In R, to obtain or define the project root directory I would use here::here() to get the project root directory, but here is a separate library. In Julia the @__DIR__ macro serves a similar purpose and is built into the language.
Initially, this Quarto file was in a directory called ./analyses and the @__DIR__
macro pointed to this directory, a child of the project root directory. I moved this file up a level into the project root dir so the @__DIR__
macro would work as intended - that is, reference to the project root directory.
I prefered to move the files because it prevents hardcoding a variable, such as project_root = "./home/user/project_root
.
Another comment about Quarto and Julia. According to the virtual environments docs, Quarto doesn’t currenlty leverage virtual environments with Julia [yet], so using Pkg.activate(@__DIR__)
or Pkg.activate(".")
doesn’t do anything when rendering. This is a shame because package management is a selling point for Julia. Instead of using my project env I had to install the below libs in my root env - not ideal since many Julia libs are still young and actively changing.
Edit Oct 2024
Recently, that is sometime in 2024, Posit announced a new Julia-native engine for Quarto. This new engine now leverages the project envrironment!! Woohoo!!
# import project libs
using DataFrames
using DataFramesMeta
using Revise
using Statistics
using Dates
using TimeZones
using Plots
@__DIR__
macro helps minimize the number of hardcoded strings in the project# define paths
= joinpath(@__DIR__, "data")
path_data = joinpath(path_data, "output")
path_data_output
# define files
= joinpath(path_data_output, "log.txt")
log_file
# used to hide details since julia prints last line
""
""
Steps:
ping website -n
#=
if file exists then
- import e/ line in the file
- if the line contains errors add it to .. else add it to ..
- split the strings via ","
=#
if(isfile(log_file))
# logistics: create empty objects
= []
errors = []
valid
# open the file for reading
= open(log_file, "r")
file
# read each line of the file
for line in eachline(file)
# check if the word "error" is present in the line
if occursin("error", line)
push!(errors, line)
else
push!(valid, line)
end
end
# close the file
close(file)
# convert ... to split (vectorized)
= split.(errors, ",")
errors_split = split.(valid, ",")
valid_split
# used to hide details since julia prints last line
""
end
""
I split some of the lines into an errors object because I’m not sure why there are errors. As will be shown below when there is no internet, the command ping website -n
in the bash script still runs and without “error”.
Let’s peer into the error and valid objects.
1:end] errors_split[
1-element Vector{Vector{SubString{String}}}:
["Wed Mar 15 17:54:01 CDT 2023 ", " 10 packets transmitted", " 6 received", " +1 errors", " 40% packet loss", " time 9056ms"]
1:5] valid_split[
5-element Vector{Vector{SubString{String}}}:
["Wed Mar 15 06:57:01 CDT 2023 ", " 10 packets transmitted", " 10 received", " 0% packet loss", " time 9012ms"]
["Wed Mar 15 06:58:01 CDT 2023 ", " 10 packets transmitted", " 10 received", " 0% packet loss", " time 9012ms"]
["Wed Mar 15 06:59:01 CDT 2023 ", " 10 packets transmitted", " 10 received", " 0% packet loss", " time 9013ms"]
["Wed Mar 15 07:00:01 CDT 2023 ", " 10 packets transmitted", " 10 received", " 0% packet loss", " time 9012ms"]
["Wed Mar 15 07:01:01 CDT 2023 ", " 10 packets transmitted", " 10 received", " 0% packet loss", " time 9012ms"]
Steps:
# initialize empty matrix
= Matrix{Union{Missing, String}}(
valid_matrix # default value
missing,
# nrow
length(valid_split),
# ncol - derived from first element in ...
# - assumes that the first element is complete
length(valid_split[1])
)
# used to hide details since julia prints last line
""
""
for i in eachindex(valid_split)
for j in eachindex(valid_split[i])
# remove leading, trailing whitespace
= strip(valid_split[i][j])
valid_matrix[i, j] end
end
I used a nested loop to populate the matrix because loops in Julia are fast!.
Let’s peer into the matrix.
1:5,:] valid_matrix[
5×5 Matrix{Union{Missing, String}}:
"Wed Mar 15 06:57:01 CDT 2023" … "0% packet loss" "time 9012ms"
"Wed Mar 15 06:58:01 CDT 2023" "0% packet loss" "time 9012ms"
"Wed Mar 15 06:59:01 CDT 2023" "0% packet loss" "time 9013ms"
"Wed Mar 15 07:00:01 CDT 2023" "0% packet loss" "time 9012ms"
"Wed Mar 15 07:01:01 CDT 2023" "0% packet loss" "time 9012ms"
# convert to data frame
= DataFrame(valid_matrix, :auto)
valid_df
# used to hide details since julia prints last line
""
""
Let’s peer into the dataframe.
1:5,:] valid_df[
Row | x1 | x2 | x3 | x4 | x5 |
---|---|---|---|---|---|
String? | String? | String? | String? | String? | |
1 | Wed Mar 15 06:57:01 CDT 2023 | 10 packets transmitted | 10 received | 0% packet loss | time 9012ms |
2 | Wed Mar 15 06:58:01 CDT 2023 | 10 packets transmitted | 10 received | 0% packet loss | time 9012ms |
3 | Wed Mar 15 06:59:01 CDT 2023 | 10 packets transmitted | 10 received | 0% packet loss | time 9013ms |
4 | Wed Mar 15 07:00:01 CDT 2023 | 10 packets transmitted | 10 received | 0% packet loss | time 9012ms |
5 | Wed Mar 15 07:01:01 CDT 2023 | 10 packets transmitted | 10 received | 0% packet loss | time 9012ms |
#=
data prep
- rename vars (in place) (hard coded)
=#
rename!(
valid_df,:x1 => :datetime,
:x2 => :transmitted,
:x3 => :received,
:x4 => :packet_loss,
:x5 => :runtime_ms
)
# used to hide details since julia prints last line
""
""
The “!” after the function is a cool feature in Julia. The “!” after the function completes an action in-place. It’s equivalent to valid_df = rename(valid_df, ...)
.
#= transform:
- remove special chars; keep digits
=#
@chain valid_df begin
@rtransform!(
:transmitted = if !ismissing(:transmitted) replace(:transmitted, r"\D+" => "") end,
:received = if !ismissing(:received) replace(:received, r"\D+" => "") end,
:packet_loss = if !ismissing(:packet_loss) replace(:packet_loss, r"\D+" => "") end,
:runtime_ms = if !ismissing(:runtime_ms) replace(:runtime_ms, r"\D+" => "") end,
)end
#= transform:
- convert "" to missing
=#
@chain valid_df begin
@rtransform!(
:transmitted = :transmitted == "" ? missing : :transmitted,
:received = :received == "" ? missing : :received,
:packet_loss = :packet_loss == "" ? missing : :packet_loss,
:runtime_ms = :runtime_ms == "" ? missing : :runtime_ms,
)end
#= transform:
- convert CDT timezone to name found in tz lookup db
- convert datetime to datetime
- Z from TimeZones package
=#
@chain valid_df begin
@rtransform!(
:datetime = replace(:datetime, "CDT" => "America/Chicago")
)@rtransform!(
:datetime = DateTime(:datetime, DateFormat("e u d H:M:S Z y"))
)end
#= transform:
- convert string to numeric
=#
@chain valid_df begin
@rtransform!(
:transmitted = if !ismissing(:transmitted) && !isnothing(:transmitted) parse(Int, :transmitted) end,
:received = if !ismissing(:received) && !isnothing(:received) parse(Int, :received) end,
:packet_loss = if !ismissing(:packet_loss) && !isnothing(:packet_loss) parse(Int, :packet_loss) end,
:runtime_ms = if !ismissing(:runtime_ms) && !isnothing(:runtime_ms) parse(Int, :runtime_ms) end,
)end
#= transform:
- convert packet loss to decimal [0,1]
=#
@chain valid_df begin
@rtransform!(
:packet_loss = if !ismissing(:packet_loss) && !isnothing(:packet_loss)
:packet_loss/100)
(end
)end
#= transform: MUST GO LAST
- convert nothing to missing
=#
@chain valid_df begin
@rtransform!(
:transmitted = isnothing(:transmitted) ? missing : :transmitted,
:received = isnothing(:received) ? missing : :received,
:packet_loss = isnothing(:packet_loss) ? missing : :packet_loss,
:runtime_ms = isnothing(:runtime_ms) ? missing : :runtime_ms,
)end
# used to hide details since julia prints last line
""
""
Unlike in R with dplyr a variable can’t be referenced more than once within a transform() call. That’s why there are multiple transform blocks above. However, it is possible to have several @transform
macros in one @chain
call.
Let’s peer into the dataframe.
1:5,:] valid_df[
Row | datetime | transmitted | received | packet_loss | runtime_ms |
---|---|---|---|---|---|
DateTime | Int64? | Int64? | Float64? | Int64? | |
1 | 2023-03-15T06:57:01 | 10 | 10 | 0.0 | 9012 |
2 | 2023-03-15T06:58:01 | 10 | 10 | 0.0 | 9012 |
3 | 2023-03-15T06:59:01 | 10 | 10 | 0.0 | 9013 |
4 | 2023-03-15T07:00:01 | 10 | 10 | 0.0 | 9012 |
5 | 2023-03-15T07:01:01 | 10 | 10 | 0.0 | 9012 |
Next is a quick summary of the data via describe(df)
.
describe(valid_df)
Row | variable | mean | min | median | max | nmissing | eltype |
---|---|---|---|---|---|---|---|
Symbol | Union… | Any | Any | Any | Int64 | Type | |
1 | datetime | 2023-03-15T06:57:01 | 2023-03-20T13:49:31 | 2023-03-28T11:09:01 | 0 | DateTime | |
2 | transmitted | 10.0 | 10 | 10.0 | 10 | 587 | Union{Missing, Int64} |
3 | received | 9.99622 | 3 | 10.0 | 10 | 587 | Union{Missing, Int64} |
4 | packet_loss | 0.000377596 | 0.0 | 0.0 | 0.7 | 587 | Union{Missing, Float64} |
5 | runtime_ms | 9013.95 | 8998 | 9014.0 | 10178 | 587 | Union{Missing, Int64} |
scatter(
valid_df.datetime,
valid_df.received,= "Received",
label = 2,
ms = :red
mc
)plot!(
valid_df.datetime,
valid_df.transmitted,= "Transmitted",
label = 3,
linewidth = :blue
lc
)plot!(legend=:outerbottom, legendcolumns=2)
title!("Internet Analysis")
xlabel!("Datetime (minute interval)")
ylabel!("Packets")
ylims!(0, 11)
When there are no issues we expect 10 packets transmitted and 10 packets received - y = 10. The red dots below the blue line indicate something happened and fewer packets were received than were transmitted. There is a large gap between March 21 and March 24. Around 3/21 I turned off the computer since replacing the ethernet cable between the modem and the router seemed to fix our intermittent internet issues. However, on 3/24 at around 11 am there was an ISP internet outage in my area.
Summary:
From this work I gleaned that for about two weeks in March 2023 there were several instances that may indicate we had some connectivity issues. On March 24 for about 8-10 hours my household had no internet (i.e., 0 packets transmitted).
Using a bash script on a spare computer running a cron job every minute I created a log of my household’s internet health. I used Julia to prep and analyze the data and I used Quarto to prepare this document. I used GitLab as the git client - this helps collaborate between computers. SSH would also be useful to collaborate between computers but I didn’t set that up. I used VS Code as the IDE.
If I were so inclined, I could keep the computer on to track the health of internet. I could even explore ways to host and automate refreshing this report. Alas, I probably won’t do that since my goal with this project was to practice Julia and learn how to use Quarto with Julia on VS Code. Prior to this project, I had only used Quarto on RStudio with R and Python.
If you read this, thank you :)