Code Archeology (Updated)

Posted on Thu 18 September 2014 in posts

One day I thought about different techniques to do source code analysis. Especially since we often have access to repositories and thus the evolution of code.

Wouldn't it be cool to see the age of certain lines of code relatively to others? So I decided to create a PoC Sublime Text Plugin to visualize the age of lines. I call this method - Code Archeology

And here is the result. This is the normal syntax highlighting:

code archeology normal

And this is an example which highlights the oldest parts of the code and darkens the newer lines:

code archeology visualized


Update 2014-09-24:

So it turns out that somebody already thought about Code Archeology long before me - John Firebaugh - Code Archaeology With Git. Maybe I have even read this article years ago, forgot about it, and subconsciously "created" it in my head again.

And it gets worse. Github already has colors to indicate older and newer files.

GitHub Blame Color Encoding

But github has only 10 different colors and the narrow coloumn doesn't really transmit the information. So to finally do something "useful" I have created a small JavaScript snippet, which you can copy into the developer console.

It parses the <time> tag of each commit and assigns colors to each line. It also removes some of the commit info to have a bigger view on the code.

GitHub Blame Color PoC

snippet:

var color_range=205;nextSibling=function(e){if(e){e=e.nextSibling;while(e&&e.nodeType!=1){e=e.nextSibling}return e}};var all_lines=[];$.each($(".blame-commit time"),function(e,t){var n=new Date(t.getAttribute("datetime"));var r=t.parentNode.parentNode.parentNode;var i=r.getAttribute("rowspan")-1;var s=Array();var o=nextSibling(r.parentNode);while(o&&i>0){i--;s.push(o);o=nextSibling(o)}all_lines.push({datetime:n,lines:s})});all_lines.sort(function(e,t){return t.datetime-e.datetime});var bucket_size=all_lines.length/color_range;$.each(all_lines,function(e,t){col=Math.floor(e/bucket_size+(255-color_range)/2);$.each(t.lines,function(e,t){t.style.backgroundColor="rgb("+col+","+col+","+col+")"})});$(".commit-info img").remove();$(".commit-body").remove();$(".commit-info").css({width:"70px","min-width":"70px"})

You can stop reading here, or continue if you are interested in my original crappy Sublime Text Plugin solution...


I don't really know if it will be usefull in the future. But it's already fun to look over code. Here for example two excerpts from OpenSSL's t1_lib.c.

Here you can see how over time more and more else ifs got added. Remember - the lighter the color, the older the code.

code archeology visualized

And here you can see that a comment was written very early, while the code changed.

code archeology old_comment

How it works

At first I created a small test repository with a simple file and a few changes. With git blame --line-porcelain -w <file> you can get the commit of each line, which then can be parsed by a python.

e04094852bd6b19b7c7fc0b4651d7299d3bb004e 1 1 3
author samuirai
author-time 1404753962
author-tz +0200
committer samuirai
committer-time 1404753962
committer-tz +0200
summary added MAX_NAME and missing #include
previous 5643cc394796389ed698ba17603da208466b06f0 test.c
filename test.c
    #include <stdio.h>
e04094852bd6b19b7c7fc0b4651d7299d3bb004e 2 2
author samuirai
author-time 1404753962
author-tz +0200
committer samuirai
committer-time 1404753962
committer-tz +0200
summary added MAX_NAME and missing #include
previous 5643cc394796389ed698ba17603da208466b06f0 test.c
filename test.c
    #define MAX_NAME 50
e04094852bd6b19b7c7fc0b4651d7299d3bb004e 3 3
author samuirai
author-time 1404753962
author-tz +0200
committer samuirai
committer-time 1404753962
committer-tz +0200
summary added MAX_NAME and missing #include
previous 5643cc394796389ed698ba17603da208466b06f0 test.c
filename test.c

e60aa7d9ece183db73d5728e6f5c8ebd6a9f2261 4 4 1
author samuirai
author-time 1404754156
author-tz +0200
committer samuirai
committer-time 1404754156
committer-tz +0200
summary fixed argc spelling error
previous 7cb68460a085d9535d9d27746ec9879180796b54 test.c
filename test.c
    int main(int argc, char **argv) 
...

Now I need to group them together and assign them a color. Unfortunately this get's really ugly in Sublime :( To create a colored line I have to generate a theme on-the-fly with different colored regions and assign them to corresponding age groups afterwards:

...
</dict>
    <dict>
        <key>name</key>
        <string>git_archeology_0</string>
        <key>scope</key>
        <string>git_archeology_0</string>
        <key>settings</key>
        <dict>
            <key>background</key>
            <string>#ffffff</string>
            <key>foreground</key>
            <string>#000000</string>
        </dict>
    </dict>
    <dict>
        <key>name</key>
        <string>git_archeology_1</string>
        <key>scope</key>
        <string>git_archeology_1</string>
        <key>settings</key>
        <dict>
            <key>background</key>
            <string>#cccccc</string>
            <key>foreground</key>
            <string>#000000</string>
        </dict>
    </dict>
    <dict>
        <key>name</key>
        <string>git_archeology_2</string>
        <key>scope</key>
        <string>git_archeology_2</string>
        <key>settings</key>
        <dict>
            <key>background</key>
            <string>#999999</string>
            <key>foreground</key>
            <string>#000000</string>
        </dict>
    </dict>
    ...
</dict>
...

Then I have to go through all my lines with Sublime views and add the corresponding region to it.

There are quite a few cons about my method. First of all you can only look at one file at a time. Because the theme is always on-the-fly generated based on the amount of groups I have, it will change the look of other open files that share the same dynamic theme. The generation is also slow - a file with ~5k LOC takes over 10 seconds.

I think that visualizing the age of code can be very useful, but somebody has to come up with a better idea how to implement it.

The PoC plugin can be downloaded here code_archeology.py. Place the script in ~/Library/Application Support/Sublime Text 3/Packages/User. Then open a file in a git repository and press ctrl+` to open the console and run it with view.run_command("example") or to reset the view use view.run_command("example", {'reset': True}). But this should never ever ever be used by anybody. It's buggy and will probably only work on my machine. I just don't want to hold back any information.