Skip to content
Brian Ward, Ph.D.
  • Home
  • Blog
  • CV
  • Contact
Uncategorized

Julia vs. Python

  • October 17, 2021October 17, 2021
  • by brian_ward

Previously, we went through an example of performing reverse complementation of nucleotide sequences using Python. Python is a very popular language, and for good reason. It’s easy to learn (relatively speaking), is used widely in industry and academia, and boasts hundreds of thousands of user-developed packages. In the previous post I briefly mentioned a relatively new language – Julia. Recently I’ve been reaching to Julia more and more frequently for performing tasks that I once used Python for.

This image has an empty alt attribute; its file name is Julia_caesaris.jpg
Julia, daughter of Julius Caesar and fourth wife of Pompey the Great

For me personally, my decision to transition towards Julia comes down to one main factor – I find it easier to read and write than Python (more on that below). I also don’t need many of the more specialized packages available for Python, the exception possibly being some machine learning packages such as scikit-learn. Of course the merits of each of these two languages are somewhat subjective, and the best way to learn which you prefer is to try doing a bit of programming in both.

However, before we get too invested in comparisons, I should point out that it probably doesn’t matter too much which one you learn first. Perl, Python, and Julia each have their pros and cons, but unless you happen to be developing more professional code, it’s more important to focus on the problems that you want to solve. The arcane details of the inner workings of different languages are not likely to be very important when you’re just starting out. If you happen to be working in the life sciences, each of these languages also have their own “Bio” ecosystems – BioPerl, Biopython, and BioJulia. These all contain various modules or packages that are designed to handle the files of biological data that you are likely to work with.

Reverse Complementation Script

Before proceeding, let’s create a new and improved reverse complementation script in Julia:

#!/usr/bin/env julia


"""
Nucleotide complement function

Takes as input one character corresponding to a base or IUPAC ambiguity code
and returns its complement character
"""
function complement(b::Char)

    ## Define complementation dictionary
    comp_dict = Dict(
        'A' => 'T',
        'T' => 'A',
        'C' => 'G',
        'G' => 'C',
        'U' => 'A',
        'W' => 'W',
        'S' => 'S',
        'M' => 'K',
        'K' => 'M',
        'R' => 'Y',
        'Y' => 'R',
        'B' => 'V',
        'V' => 'B',
        'D' => 'H',
        'H' => 'D',
        'N' => 'N'
    )

    ## Test if input base is lowercase
    ## using tertiary condition ? then : else syntax
    islowercase(b) ? lower = true : lower = false

    ## Convert to uppercase if required; attempt to perform complementation
    b = uppercase(b)
    if b ∈ keys(comp_dict)
        c = comp_dict[b]
    else
        error("Input should be an IUPAC nucleotide code\n
               see https://www.insdc.org/documents/feature_table.html#7.4.1")
    end

    ## Return complement (convert back to lowercase if input was lowercase)
    ## semicolons are used to place the if block on a single line
    if lower; c = lowercase(c); end
    return(c)
end


#### Executable ##########

for line in eachline(stdin)
    if startswith(line, ">")
        println(line * "_revcomp")
    else
        for base in reverse(line)
            print(complement(base))
        end
        print("\n")
    end
end

In general the process is the same as in our Python script. However, there are a few improvements. We have written a function, complement(), which does some more sophisticated handling of the input than what our Python script’s lookup dictionary did. While we still use a dictionary to perform the complementation, this time we include all the various IUPAC ambiguity codes. In addition, we can handle the RNA-specific base uracil (U), and the function will throw an exception if it encounters input that isn’t a valid IUPAC code.

Syntax

Otherwise, there are some obvious syntax differences compared to the Python script; for instance, Julia requires the keyword end to signify the end of a code block such as an if or for statement, while Python only relies on indentation. In Python you can concatenate strings together using “+”, while “*” does the equivalent in Julia. In Julia, print() will print without any delimiter by default, while println() prints and appends a newline. The syntax for defining a dictionary in Julia is also somewhat different from the syntax used to define a dictionary in Python, using the “=>” arrow symbol to denote each key/value pairing.

One nice feature included in Julia is support for many Unicode characters. At first I thought this was a bit gimmicky, but it’s actually really great. It’s much easier to read code when you can write π instead of pi, or λ instead of lambda. You can see a bit of this in the code above where we use the set membership operator, so that we write: if b ∈ keys(comp_dict) instead of if b in keys(comp_dict).

There is one subtle Julia-specific gotcha present in the code above. In Python single quotes or double quotes can both be used to denote strings. In Julia double quotes denote strings, while single quotes denote characters, so:

julia> "A" == "A"
true

julia> 'A' == 'A'
true

julia> "A" == 'A'
false

‘A’ is a character, while “A” is a string of length one. Looping through a string yields characters, and a dictionary can’t be used to replace characters with strings, or vice versa, so the dictionary above wouldn’t work for our case if we defined the letters with double quotes. We can specify that the complement() function only accept a character as input when we write ::Char to limit the type of input. In general characters can be concatenated together into strings, for instance using the join() function, but again in our case we can merely print out our complemented characters one at a time.

Another not so small difference here is that Julia is not object-oriented. In general you won’t see the object.method syntax that is common in Python, except when explicitly specifying which module or package a particular function belongs to. Rather, Julia uses a sophisticated multiple dispatch system as one of its core paradigms. Therefore functions will perform different actions based on the type of data they are called on. For instance, here we use reverse() to reverse a string, but we could also call reverse() on a vector of integers: reverse([1, 2, 3, 4]), which yields [4, 3, 2, 1]. The lack of object orientation makes Julia’s syntax somewhat more similar to R’s. I personally find this very convenient, as I also use R quite extensively.

Speed

The fastest languages are all compiled – this means that you write your code, and then run it through a program called a compiler, which translates it into machine code. Machine code is low-level, not human-interpretable, and tailored to the computer’s hardware. Machine code compiled from good higher-level code can be extremely fast. Modern compilers are highly optimized and can even catch and improve some forms of poor coding. In contrast, Python and R are both interpreted languages. In an interpreted language, a program called an interpreter directly executes code, without first converting it to machine code. Interpreted languages tend to be slower but “snappier” for interactive work.

You might be thinking to yourself that Python and R don’t seem that slow. This is partially true – a lot of functions in popular packages run very well. However, if you take a look under the hood, you’ll see something surprising: most of these functions are written in compiled languages such as C, C++, or Fortran. This has a few consequences. For one, it means that if you want to become a good Python or R developer, you must counterintuitively become proficient at these other languages. In addition, if you as an end user want to dig into a package’s functions and see how they work, you’ll be out of luck unless you’re familiar with these languages.

Julia is interesting in that it is sort of a hybrid of the two approaches. It uses a just-in-time (JIT) compiler, so when you run any code, it is first compiled and then run immediately afterwards. This has a few ramifications. First, a Julia script will typically run substantially slower the first time it is run, since it must also be compiled. If you run it again in the REPL (the read-evaluate print loop user interface), it will be much, much faster. Second, Julia can be used interactively, but in this application it definitely feels a little more clunky than a language such as python or R (though there are some workarounds to this). Julia’s developers claim that it can match the speed of older compiled languages such as C and Fortran. I don’t doubt this is the case in some scenarios. I have a harder time believing that it can generally match the speed of these languages over a wide range of applications, but maybe I’ll be proven wrong. Recently Julia became the newest language to successfully perform calculations on a supercomputer at the petaflop scale – until then the only other languages used for this feat were C, C++ and Fortran.

Below I’ll try timing our two reverse complementation scripts head to head. However, we’ll up the stakes a little bit by giving them 10 million 20-mers to reverse complement, instead of just five. Here are the timings on my four year old Toshiba laptop:

$ time cat 10million_random_20mers.fa | ./revcomp_std.py > 10M_20mers_RC.fa

real	1m41.934s
user	1m39.669s
sys	0m1.689s

$ time cat 10million_random_20mers.fa | ./revcomp_std.jl > 10M_20mers_RC.fa

real	3m2.536s
user	2m58.627s
sys	0m2.696s

Here the Julia script takes about twice as long to run as the Python script. Of course the Julia script we wrote is doing many more operations to check the input. What happens if we run a script that is identical to the Python script in Julia (or as close to identical as possible)?

#!/usr/bin/env julia

comp_dict = Dict('A' => 'T', 'T' => 'A', 'C' => 'G', 'G' => 'C')

#### Executable ##########

for line in eachline(stdin)
    if startswith(line, ">")
        println(line * "_revcomp")
    else
        for base in reverse(line)
            print(comp_dict[base])
        end
        print("\n")
    end
end

We’ll call this script revcomp_simp.jl and test its running time on the same input file:

$ time cat 10million_random_20mers.fa | ./revcomp_simp.jl > 10M_20mers_RC.fa 

real	0m50.844s
user	0m49.598s
sys	0m1.018s

As you can see, the Julia script runs roughly twice as fast as its Python counterpart. This is highly surprising to me. I was expecting rather modest differences between the two since this is a relatively simple task. Note that because we are calling the Julia script directly, and not running it from within a REPL session, it will have to recompile every time. Therefore, the timing for the Julia script contains the compilation time.

Development Environments

Integrated Development Environments (IDEs) are programs that function as one stop shops for writing code in a particular language. Typically they include a text editor with syntax awareness for the particular language being used, a way to run code interactively, and often ways to plot output graphics. Working with R, I became spoiled by Rstudio, one of the best IDEs around. In contrast, I’ve never been able to find a python IDE that I was really comfortable with. I’ve probably come closest with Spyder, the IDE that comes bundled with Anaconda. Some other IDEs like PyCharm are overkill for my purpose, while browser-based notebooks such as Jupyter are a little too barebones for my needs.

Julia has the advantage that its help documentation and package management system are very well integrated into the basic REPL. Open a terminal, type “julia” to start the REPL, and you can then type ? to open the help prompt, ] to open the package manager prompt, or ; to open the shell prompt. In general Julia’s package management system is very well thought out, and supports the creation and use of separate, self-contained environments out of the box. On top of the REPL’s strengths, Julia has two very good development environments in Juno (a plugin for the Atom text editor) and the Julia plugin for Visual Studio Code.

Conclusion

As I mentioned earlier, the choice of which general purpose language to learn really comes down to personal preference, as well as its adoption by the people you’ll be working with. For me, Julia ticks off all the boxes, combining ease of coding with speed when its needed. However, you can’t go wrong with Python, particularly if you think you’ll be relying heavily on pre-existing packages. Ultimately, the best way to figure out which language you prefer is to try your hand at coding similar projects in each, to get a hands-on evaluation of each language’s strengths and weaknesses.

Recent Posts

  • Julia vs. Python
  • What Software Should I Learn? Pt. II
  • What Software Should I Learn? Pt. I
  • Misconceptions About Genetics and Evolution in Pop Culture, Part II
  • Investigating COVID-19 From Your Couch

Recent Comments

    Archives

    • October 2021
    • March 2021
    • January 2021
    • June 2020
    • May 2020
    • January 2020
    • October 2019
    • February 2019
    • June 2018
    • February 2018
    • December 2017
    • November 2017
    • October 2017

    Categories

    • Everyday Annoyances
    • Lab Protocols
    • Observations
    • R coding
    • shell scripting
    • Uncategorized

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    LinkedIn   ResearchGate   ORCID

    © 2017 Brian Ward
    Theme by Colorlib Powered by WordPress