Thursday, 16 May 2013

A little python vs ruby

Just a little comparison of python vs ruby which nicely sums up why I prefer ruby to python.

Coming from a perl background, parsing strings and manipulating data structures is the bread and butter of sysadmin scripting. Here we're going to split a string, add to the array, then join into a string again.

First, in python:

a = "a b c".split()
>>> a
['a', 'b', 'c']
>>> len(a)
3
>>> a.append("d")
>>> ",".join(a)
'a,b,c,d'

Ok, fair enough I guess. Now lets look at ruby:

a = "a b c".split
=> ["a", "b", "c"] 
a.size
=> 3 
a.push "d" 
=> ["a", "b", "c", "d"] 
a.join "," 
=> "a,b,c,d" 

Notice the difference? I'll try to summarise it in text.

For python, we split the string to get an array. We then run a global method to find the size of the array. We then call a method on array to append the new element. Finally we run a join method on a string and pass it the array.

For ruby, we split the string to get an array. We then run a method on the array to get the size (.length would also have worked). We then run a method on the array to add the element. We then run a method on the array to join to get our final string.

To me ruby is far more consistent. I'm not saying python is a bad language, I just find ruby to be particularly enjoyable to use.

Criticisms, suggestions and improvements welcome :)

Tuesday, 16 April 2013

Passphrase Generators Part II

A while back I wrote a post about using bash as a passphrase generator (here). The idea being that it would be better to pick "randomly" from a list of common words than it would be to try to mentally choose four words at random. Humans are not very good a being random [1]. Derren Brown, if you're reading this, I'd love to see an episode where you plant a password into someone's head for something they believe is of utmost importance. This post addresses some security concerns about how the words are chosen at random.

Speaking of passphrases and passwords, I should probably clarify what I mean by the two terms. A password I take to mean a private string which more or less resembles a "word" in the traditional sense. Typical and poor examples would be "password", "Password1", "aksdfjslfjdf" etc. Something like "8yDcBPAllEFOLNNI89ZN4nq5" being a good example. A passphrase I take to mean a private string composed of multiple "words". For example "That's me in the corner, that's me in the spot light", "slgl sdf mcr asrj" are poor examples, and "Cupertino teenage fell Paradox" is a good example. The analogy being to word and phrase in spoken languages. So generally, a good passphrase will be better than a good password because of length, assuming you havn't done something silly like using the lyrics of your favorite song. The passphrase would still need to be random, but easier to remember.

Anyway, having spent some time chatting on IRC about the flaw of using % 10000, using Bash's $RANDOM variable and a few other things, I thought I would update the script to be more secure. The entropy lost from using % 10000 isn't particularly significant, but the way $RANDOM works means you could probably reduce the choice of possible words significantly if you knew when approximately the passphrase was being generated. Basically, $RANDOM seeds off gettimeofday's seconds, useconds and the bash PID. I haven't done the maths for exactly what the impact is, but I thought it would be worth this blog post.

I am actually going to provide two new scripts. The previous method used a dictionary file with 10000 common words downloaded from the internet, and embedded them in the Bash script. Now, I still think this is a good approach because 10000^4 still gives enough possible combinations and because there are only 10000 words, many words will be familiar to most people and should therefore be easier to remember. I will first present an alternative approach.


#!/bin/bash
words=${1:-4}

while true; do
    clear
    for i in $(seq $words); do
        word=$(/usr/bin/shuf -n1 --random-source=/dev/urandom /usr/share/dict/words)
        echo -n "$word "
    done
    echo ""
    read -p "Accept (y/n) [n]: " answer

    if [ "$answer" == "y" ]; then
        exit 0
    fi

done

You will probably notice two things. Firstly, it only takes one argument; the number of words to show, or 4 by default. The previous version of the script printed X number of words on Y lines so that you could choose a passphrase that suited you. However, there was always the temptation to create a new passphrase from a combination of what was shown on the screen. This is not ideal. You are better off rejecting a number of whole passphrases and then accepting one as-is, rather than combining several passphrases into one. The reason being that the process of combining passphrases is no longer random and there will probably be a bias towards a structure such as "adjective noun verb adjective". Or something like that anyway.

So, instead, we just keep looping until you accept the passphrase and clear the screen if you reject it (the default). By clearing the screen I hope to lessen the chance of you combining several passphrase - out of sight, out of mind as they say.

The second major change here is that instead of using 10000 common words downloaded from the Internet, we're using the OS's provided dictionary file. If your distribution doesn't have that file, see here http://en.wikipedia.org/wiki/Words_(Unix).

We use the shuf command X number of times to select X words for the passphrase. On one of my machines there are 98326 words in /usr/share/dict/words. I think I saw many more on my Ubuntu system. So you can probably expect some pretty obscure passphrases. You'll also note that we tell shuf to use /dev/urandom as the source of randomness as the built in source isn't secure.

Right, now onto the other option. In this case we're going to just modify the previous script and fix the source of randomness. We'll also introduce the while true loop and hide previous rejected passphrases.

#!/bin/bash

dict=$1
script=$2

chars="a-zA-Z0-9 "
size=$(cat $dict | wc -l)
echo "#!/bin/bash" > $script
( echo -n "dict=( "; i=0; cat $dict | while read word; do echo -n "$word " | tr -c -d "$chars"; if [ "$i" -eq "10" ]; then i=1; echo ""; fi; ((i++)); done; echo ")" ) >> $script

echo "size=$size" >> $script
cat << 'EOF' >> $script
words=${1:-4}

while true; do
    clear
    for i in $(seq $words); do
        index=$(cat /dev/urandom | tr -d -c '0-9' | fold -w 4 | head -1 | sed 's/^0*//')
        word=${dict[$index]}
        echo -n "$word "
    done
    echo ""
    read -p "Accept (y/n) [n]: " answer
    if [ "$answer" == "y" ]; then
        exit 0
    fi
done
EOF
chmod +x $script

Other than the loop change, the only difference is our source of randomness. Instead of using $RANDOM we read /dev/urandom directly. We strip out anything other than digits using the tr command and fold it to 4 characters (0000 to 9999). We then just take the first result and strip off any zeros from the front. We do this because bash doesn't treat 0080 as 80 and isn't happy using 0080 as an index. There is probably a better way of stripping of the zeros using bash's string manipulation, but this will do for now.

Running this script is the same as the previous version, you give it the dictionary and an output file name. E.g.

$ wget http://wortschatz.uni-leipzig.de/Papers/top10000en.txt
$ ./make.sh top10000en.txt ppgen.sh

Unfortunately the dictionary isn't provided over HTTPS, so it is possible that someone is going to MITM your download and change all the words to suit themselves.

Perhaps something like this might help:

$ wc -l top10000en.txt
10000 top10000en.txt


 $ sort < top10000en.txt | uniq | wc -l
10000


That way at least you know there are 10000 unique words. Either that or find a new dictionary available over HTTPS (and let me know where you found it if possible).

Hopefully you'll find either of these approaches useful.


[1] http://www.ncbi.nlm.nih.gov/pubmed/17888582

Thursday, 14 March 2013

Ultra Light Graphs (ULG)

Ultra Light Graphs, or ULG for short, is a tool that allows you to create basic graphs using nothing more than a few lines of text. The text format is designed to be meaningful in itself, which makes ULG a sort of markup language for graphs. It isn't intended to replace something more comprehensive like graphml or dot, but it does allow you to quickly write something that is readable in your favourite text editor and can be turned into an image that you might want to insert into documentation etc.

Edit:

Installation can do be done using ruby gems:

$ gem install ulg

Using ULG


ULG is very easy to use. Just give it a file with the text in it and it will generate a png by default.


$ ulg -h
Usage: ulg.rb [-o OUTPUT] [-s FILE] [-d] FILE FILE...

Reads from stdin if no input FILE given

Other options:

  -o OUTPUT  Output format (png, dot, svg). Default png
  -s FILE    Save to file
  -d         Debug mode
  -h         This help



For example

$ ulg basic.ulg
Reading basic.ulg
Saving output to basic.ulg.png

Basic example


Right, lets say you want to draw a basic graph. You open up your favourite text editor such as vim. You want two nodes and an arrow, so you might intuitively type something like this:

internet ==> web dmz

Now, this is what ULG would make of just that one line:

Nice and simple.

Node shapes


Maybe you don't want two boxes. Perhaps you want one of them to be oval. That's easy too:

(internet) ==> [web dmz]

That would give you:


Notice also how we've also explicitly marked "web dmz" as a box.

Edge label


You've probably noticed that the arrow, or edge in graph-speak, doesn't have a label. We can give it one by just inserting some text somewhere inside the arrow:

(internet) ==http==> [web dmz]

This would give us:


Arrows


You can also control the shape of the arrow head, for example:

(internet) ==http==<> [web dmz]

Gives us:


Edge line


Of course, you might like your lines to be dashed instead of solid. No problem:

(internet) --http--> [web dmz]

Here we go:


Interesting example


Now, lets make things a little bit more interesting. I'll show you the text and image first, then I'll explain a few things afterwards:


option layout dot
option overlap false
option splines true

include backoffice.ulg

Internet        ==http==>       (web dmz|orange)

web dmz         ==8080==>       [app stack 1|blue]
web dmz         ==8000==>       [app stack 2|purple]

app stack 1     ==25==>         internet
app stack 2     ==5432|red==>   [database|green]

Where backoffice.ulg is


(back office) ==5432|red==> database
(back office) ==80==> (internet|red)

This would result in this rather pretty graph:


Quite a few things have happened here. Looking at the text, you will have noticed three lines starting with the word "option". These are basically just option names and values that get passed directly to Graphviz as global options.

We also include a ULG file called "backoffice.ulg", the contents of which is shown afterwards. You'll see that backoffice defines an "internet" node and the main text refers to an "Internet" and an "internet" node and that the graph only shows one "internet" node. This is because ULG ignores case and spaces when determining what the node is. So "Internet", "internet" and "inter net" would all be the same thing. This allows you to be a little be messy and inconsistent with your typing ;)

The "internet" node was labelled "internet" and not "Internet". This is because ULG defines styles based on the first occurrence of the node or edge. Because we included backoffice.ulg first, it defined the style for the internet node. This also means that once we have defined a node, such as the "(web dmz|orange)" we ignore any style characters and simply use the label. The next two occurrences were referred to as just "web dmz".

Another thing you will have noticed is the use of colour. This is achieved by simply adding "|colour" to the node or edge label. Any colours Graphviz understands should be fine.

Its probably also worth mentioning at this point that ULG doesn't care about whitespace between nodes and arrows, so you can use spaces or tabs - whatever is easiest to read. It does however require that an arrow is made up of at least two line characters (such as == or --)

Other things you can do


Finally, this example shows the different combinations of supported node and arrow styles.


# This line is ignored

# Include a file
include common-opts.ulg

# An option
option rankdir TB

# Graph
[box]   = normal ==>    (oval|blue)
oval    =dot|red=o      <diamond>
oval    ..box....x      box
box     --diamond--<>   diamond

another box     === no arrow ==     {hexagon}

This gives us:


Hopefully this example speaks for itself.



Monday, 11 March 2013

OpSec Basics

This is a simple little list of OpSec ideas that should be followed in most situations. Nothing ground breaking, just a gentle reminder :)

Take regular backups


Regularly backup systems and data using a combination of incremental and full backups.

Encrypt transmitted data whenever possible


Even on local networks. You may have perimeter firewalls and a switched network, but you have to assume someone is listening.

Minimize software to minimize vulnerability


Reduce the attack surface for remote access and local privilege escalation by only having essential software installed and services running.

Run different network services on separate systems


Not only does this simplify diagnosing operational issues, but means a compromise of one service doesn't directly affect other services. It also reduces the possibility of another service being used in the next stage of an attack.

Least privilege


This should apply at all levels, from system accounts to application logins. Don't use shared credentials and use systems that have granular enough permissions. Someone might not want to do something malicious, but if their system has been compromised the attacker might.

Patch your sh!#


Unpatched software is a liability. Every unpatched system is low-hanging fruit for an attacker.

Be informed


You should have a clear insight into the standard operational and opsec behaviour of your environment. This means having good operational monitoring and metrics, security systems such as HIDS and NIDS and a hardened centralized log collector for system logs and SIEM. Once you have that, take the time to review logs, events and set up alerts if possible.

Segment your network


Compromise is inevitable  so the trick is to limit damage once it happens. Network segmentation stops the attack from moving around freely and thus adds complexity and delay to the next stages of their attack.

Use good crypto and don't roll your own


This applies to everything from choosing a password, to storing passwords and encrypting backups. Don't rely on home-made encryption or obfuscation tricks.

Have a response


It is a question of when you get hacked, not if. You should have an incident response plan well in advance of anything happening. Basics should include how to limit damage, how to collect information for forensics or the authorities and details of who to contact.

Default configuration

When you buy a new piece of kit, build a new system or install an application, or should remember to check and adjust any default configuration. Most things try to be easy, not secure, so leaving in default configuration can be dangerous. Default users are a classic example, but other more subtle configuration might give too much information away or might be permissive in terms of access or actions.


Sources:

Saturday, 22 September 2012

What I like about my 2002 Volvo S40 SE 2.0 Auto


Ok, so, for a completely off-topic post about a car. I bought a 2002 Volvo S40 SE 2.0 Automatic a few weeks ago for family day trips etc and not commuting. So I wanted something cheap, comfortable and safe. This car seems to pretty much fit exactly, so much so that I thought I'd share some of the things I like about it. Perhaps you are in a similar situation and are considering buying a second-hand Volvo.



Comfort

For long day trips or weekends away this is quite important. Bouncing down the motorway in a old Fiesta isn't exactly comfortable. So, what does my car give me?

  • Climate control - Not just aircon, but proper climate control. Set it to AUTO, choose a temp and voila.
  • Automatic - Not to everyone's taste, but I'm kinda liking it. Makes traffic jams bearable.
  • Cruise control - Haven't used it yet, but nice idea for those long motorway journeys.
  • Heated front seats - Still too warm for them, but I think they'll come in handy in the winter :)
  • Adjustable dashboard light brightness - Nice little touch.
  • CD and tape player - CD player is a requirement, and the tape player is even handy for playing mp3s or Spotify from my phone.
  • 2.0 Engine - Car is pretty responsive.
  • Heated door mirrors - Again, too warm to have used them yet, but will come in handy later.

Safety

The most important thing as a family man is of course safety.

  • Side markers - Little orange lights on the sides of the car at the front and back to help visibility of the car from the side. Required in North America but not in Europe, I didn't see a single other car last night with them whilst walking in town.
  • Headlight wipers - Not only do they look kinda cool, but sort of handy.
  • Driver and passenger airbags - The basics.
  • Side impact cushions - A nice extra little touch, adding to safety.
  • 4 star Euro NCAP rating - "The S40 was awarded four stars for protection in frontal and side impacts, the only one of 13 family cars tested to achieve this result"

Family Friendly

The idea was to get a car that was big enough for day trips without being too big for the terrible on road parking where I live.
  • Parking voucher clip - Silly, but very cool. A little clip on the right of the windscreen to place parking vouchers etc. Handy, really handy and a nice little touch.
  • Boot size - 415 litres, big enough to put the pram base in without taking off the wheels.
  • Lots of random storage - Little storage spaces throughout the car.
  • Cup holders in central arm rest - Another nice little touch.
  • Full electric windows with driver-side control - All four door windows are electric.
  • Leather seats - Easy to clean and spill friendly, perfect for kids in the back.

Finally, and not in any particular category, awesome advice from the excellent community at volvoforums.org.uk

Sunday, 19 August 2012

Visualisation with Scratch

Scratch allows you to create interactive stories, games, music and art using drag-and-drop programming. It is an excellent way to introduce basic programming concepts, such as variables and loops. But, it is also quite powerful, as it supports bi-directional communication to such things as Lego Mindstorms, and more generally, using TCP/IP. I decided to see what sort of data visualisation I could achieve with a few spare minutes and a bit of python code.


The first attempt ended up as a sort of "meme cloud". Here, you send Scratch an alert level and a message, and it will display a meme based on the level, with the message in a speech bubble.The levels are:

The most recent alert is shown with no transparency and with the speech bubble. As the alerts get older, they become increasingly transparent. Positions of the memes are random. It currently only supports 10 sprites, which means it will show up to 10 memes at any one time.

Here is a section of the code for one of the sprites:


The problem with this approach is that Scratch doesn't allow you to dynamically create sprites, which sort of limits the functionality for certain things. However, there is an alternative. A sprite has, amongst all sorts of things, a pen. This pen can have colour, thickness, shade etc. Moving the sprite while the pen is "down" basically just draws on the screen. So, although we can't dynamically create sprites, we can draw all sorts of things programmatically. This takes us to attempt number two.

The plan here was to draw some sort of time chart or graph. A python script sends Scratch a pen size and colour for every event, and Scratch draws this. As it receives events, the pen moves to the right to create a sort of time chart. If the pen gets to the end of a line, it drops down to a new line. If it gets to the bottom of the screen, it goes back to the starting position in to top left corner. This means you can see the details of a relatively large number of events on a single screen, in order of time.

The example video below just uses random colours and pen sizes, so you won't be able to see any patterns. However, with real data you should be able to see correlations such as spikes in the number of errors, or outages etc. Also, the time between events in the video is fixed, so it looks a little boring. The white square shows the current position.




Not bad for a bit of dragging-and-dropping, and some python. I'll leave it as an exercise for the reader to integrate it into something like Splunk or Nagios :)

You can download the meme cloud python here, and the meme cloud scratch file here. The line viz python can be downloaded here, the scratch file here.

Wednesday, 18 January 2012

Passphrase generator in bash

[EDIT]

The script below has been improved upon and can be found here: http://blog.0x10.co.uk/2013/04/passphrase-generators-part-ii.html

Passwords are dead. Long live passphrases. Or something like that anyway. As perfectly summarized by XKCD here http://xkcd.com/936/, passwords are hard to remember and easy to guess. Passphrases on the other hand are easy to remember and hard to guess.
To help generate passphrases, I've put together a little bash script.
Firstly, download the dictionary containing 10000 most common English words. If you have a 4 word passphrase, thats 10^16 possible combinations.
$ wget http://wortschatz.uni-leipzig.de/Papers/top10000en.txt
Now save the following script:
 
#!/bin/bash

dict=$1
script=$2

chars="a-zA-Z0-9 "
size=$(cat $dict | wc -l)
echo "#!/bin/bash" > $script
( echo -n "dict=( "; i=0; cat $dict | while read word; do echo -n "$word " | tr -c -d "$chars"; if [ "$i" -eq "10" ]; then i=1; echo ""; fi; ((i++)); done; echo ")" ) >> $script

echo "size=$size" >> $script
echo 'count=${1:-10}
words=${2:-4}

for i in $(seq 1 $count); do
    first=1
    for j in $(seq 1 $words); do
        r=$RANDOM
        let "r %= $size"
        word=${dict[$r]}
        if [ "$first" == "0" ]; then
            echo -n " "
                fi
        echo -n "$word"
        first=0
    done
    echo ""
done
' >> $script
chmod +x $script
Using modulo on a random number? tut tut tut. Anyway, I've called this script make.sh.
Now run it, giving it the dictionary file and an output script name:
$ ./make.sh top10000en.txt ppgen.sh
What it basically does is embed an array of words into ppgen.sh plus the code necessary to show passphrases.
It will look something like:
#!/bin/bash
dict=( the of to and a in for is The that on 
said with be was by as are at from 
it has an have will or its he not 
were which this but can more his been would 
...
definite fragile rewards antiabortion respects careers backers seize inefficient 
conceptual densities EPS Me Sparc spirits experimentally shallow )
size=10000
count=${1:-10}
words=${2:-4}

for i in $(seq 1 $count); do
    first=1
    for j in $(seq 1 $words); do
        r=$RANDOM
        let "r %= $size"
        word=${dict[$r]}
        if [ "$first" == "0" ]; then
            echo -n " "
                fi
        echo -n "$word"
        first=0
    done
    echo ""
done
We can now run it:
$ ./ppgen.sh 
Does Brazil quotas message
amount losers sing Oregon
patient misleading rebels smoking
refuse subordinated clerk wrote
have handle Rockwell strong
Committee Insurance consumer else
tariffs Unit mouth loved
V singled graduate kidney
uranium Hungary scientists Mar
fined Hercules Kemp Indian
Or, if we wanted to create just one 5 word phrase:
$ ./ppgen.sh 1 5
Commodity dioxide Boris unit drop

Thursday, 12 January 2012

Visualising rules from Firewall Builder - Part 1

Firewall Builder (available here http://www.fwbuilder.org/) is a GUI that lets you configure a variety of firewall devices, for example iptables on a linux box or Cisco ASAs.

The idea is to take the configuration file from Firewall Builder and generate some sort of rule visualisation. The ruleset I'm using is basically just the built-in template called "fw template 3". I then saved the ruleset as testfw.fwb.

So, lets process the file:

user@host:~/fwbvis$ ./fwbvis.py testfw.fwb ignoreme testfw.dot
user@host:~/fwbvis$ 

Where the arguments are as follows:

  • fwbvis.py - the script (below)
  • testfw.fwb - the firewall config saved by Firewall Builder
  • ignoreme - to be ignored for now. It will become a configuration file for colours etc.
  • testfw.dot - the output file. We just generate a png at the moment, in this case called testfw.dot.png

Here is the result:

It isn't perfect, but it is a pretty good start. The colours are as follows:

  • red - reject
  • orange - deny
  • green - accept
  • blue - anything else

So, here is the script so far. It's pretty ugly btw...

#!/usr/bin/python

import sys
import pydot
from xml.etree.ElementTree import ElementTree

sFWFile = sys.argv[1];
sConfigFile = sys.argv[2];
sOutputFile = sys.argv[3];

try:
    ds = open(sFWFile)
except:
    print "Could not open file", sFWFile
    sys.exit(1)

try:
    tree = ElementTree()
    tree.parse(sFWFile)
except:
    print "Could not parse file", sFWFile
    sys.exit(1)

graph = pydot.Dot(graph_type='digraph')

prefix = ""
rules = []
objects = {}
for element in tree.iter():
    tag = element.tag
    if tag[0] == "{":
        uri,tag = tag[1:].split("}")
        if prefix == "":
            prefix = uri
    oid = element.get('id')
    if oid is None:
        next
    objects[oid] = {'tag': tag }
    for (key,value) in element.items():
        if key != 'id':
            objects[oid][key] = value

for rule in tree.findall('.//{%s}PolicyRule' % prefix):
    oid = rule.get('id')
    action = objects[oid]['action']
    color = "black"
    if action == "Reject":
        color = "red"
    elif action == "Deny":
        color = "orange"
    elif action == "Accept":
        color = "green"
    else:
        color = "blue"

    src = rule.find('.//{%s}Src' % prefix).find('.//{%s}ObjectRef' % prefix).get("ref")
    if src in objects:
        src = objects[src]['name']
    dst = rule.find('.//{%s}Dst' % prefix).find('.//{%s}ObjectRef' % prefix).get("ref")
    if dst in objects:
        dst = objects[dst]['name']
    srv = rule.find('.//{%s}Srv' % prefix).find('.//{%s}ServiceRef' % prefix).get("ref")
    if srv in objects:
        srv = objects[srv]['name']
    else:
        srv = "unknown"
    graph.add_edge(pydot.Edge(src, dst, label=srv, color=color))

#graph.write(sOutputFile)
graph.write_png(sOutputFile+".png")

So, there you have it. More to follow as things improve.

Sunday, 1 January 2012

Some bash array functions

Following on from the previous post, here are some functions for bash arrays. I'll add to this post later with more functions.
Test if a variable is an array:
# is_array - Tests whether variable name is an array
# Parameters:
#   name - name of variable to test
# Returns:
#   0 if an array
#   1 otherwise
is_array()
{
    # Only takes one argument
    if [ "${#}" -ne "1" ]; then
        return 1
    fi

    # Name of the variable to test
    local name="$1"; shift
    # Run the declare command against the variable
    local x=$(declare -p $name 2>/dev/null)
    # If it errored then the name probably isn't even
    # a variable
    if [ "$?" -ne "0" ]; then
        return 1
    else
        # It is at least declared, test if it is an array
        if [ "${x:8:2}" == "-a" ]; then
            return 0
        else
            return 1
        fi
    fi
}
Push values onto an array:
# array_push - Pushes one or more values onto an array
# Parameters:
#   name - name of the destination array
#   ...  - values to push onto array
# Returns:
#   Nothing
array_push()
{
    # Need two or more arguments
    if [ "${#}" -lt "2" ]; then
        return
    fi

    # Name of the destination array
    local name="$1"; shift
    # Number of elements we're going to push
    local numElements=${#}
    # Counter
    local i=0
    # Starting position of destination array is the
    # size of the array
    eval local start=\$\{\#$name\[\@\]\}
    # Iterate over the elements supplied, adding them
    # to the destination array
    while [ "$i" -lt "$numElements" ]; do
        pos=$((start+i))
        eval $name\[$pos\]=\"$1\";
        shift
        ((i++))
    done
}
Pop from an array:
# array_pop - pops a value from the end of the array
# Parameters:
#   name - name of array
#   dest - optional name of variable to put value in
# Returns:
#   Nothing
array_pop()
{
    # Need at least one argument
    if [ "${#}" -eq "0" ]; then
        return
    fi

    # Name of the array
    local name="$1"; shift
    # If we have a second argument, use it to write value
    if [ -n "$1" ]; then
        local dest="$1"; shift
    else
        local dest=""
    fi
    # Size of the array
    eval local numElements=\$\{\#$name\[\@\]\}
    # Index of element is one less than size
    local index=$((numElements-1))
    # Get the value
    eval local value=\"\$\{$name\[$index\]\}\"
    if [ -n "$dest" ]; then
        # Update the specified dest variable
        eval $dest=\"$value\"
    else
        # Simply echo the value
        echo "$value"
    fi
    # Unset from the array
    eval unset $name\[$index\]
}
One thing to note is the clash between variable names. E.g. if you wanted to return the value into the name variable, it would be bad to call that variable 'value'. local doesn't help here, best option would probably to use obscure variable names that are unlikely to clash.
Walk an array:
# array_walk - Walks through an array, calling the
#              callback function for each element
# Parameters:
#   name     - name of array to walk
#   callback - name of callback function
#   ...      - any parameters to be passed to callback
# Returns:
#   Nothing
array_walk()
{
    # Name of array to walk over
    local name="$1"; shift
    # Name of call back function
    local callback="$1"; shift
    # TODO - We should probably check whether callback
    # is actually a function

    # Number of elements in array
    eval local numElements=\$\{\#$name\[\@\]\}
    # Counter
    local i=0
    # Iterate over the elements in the array
    while [ "$i" -lt "$numElements" ]; do
        # Get the value
        eval value=$\{$name\[$i\]\}
        # Check whether it is an array
        is_array $value
        if [ "$?" -eq "0" ]; then
            # If so, recursively walk that array
            array_walk "$value" "$callback" "$@"
        else
            # Just a variable, so call the callback
            # function giving it the args and value
            $callback "$@" "$value"
        fi
        ((i++))
    done
}
Now, a little play with those functions:
declare -a aTest

echo First push
array_push aTest 'Hello' 'there' 'you' ':)'
echo new array is ${aTest[@]}
echo

declare -a aTest2
echo Second push
array_push aTest2 'How' 'are' 'you today?'
echo second array is ${aTest2[@]}
echo

echo Third push to add second array to the first
array_push aTest 'aTest2'
echo first array is ${aTest[@]}
echo

echo Defining destination array
declare -a aMerged
echo defining callback function
merge()
{
    local name="$1"
    local value="$2"
    # Simple push the value onto the destination
    array_push "$name" "$value"
}
echo Doing walk
array_walk aTest merge aMerged
echo merged array is ${aMerged[@]}
echo

echo Popping last element of merged array by reference
array_pop aMerged val
echo Value is $val
echo and again
array_pop aMerged val
echo Value is $val
echo merged array is now ${aMerged[@]}
Running it gives us:
$ ./arrays2.sh 
First push
new array is Hello there you :)

Second push
second array is How are you today?

Third push to add second array to the first
first array is Hello there you :) aTest2

Defining destination array
defining callback function
Doing walk
merged array is Hello there you :) How are you today?

Popping last element of merged array by reference
Value is you today?
and again
Value is are
merged array is now Hello there you :) How
$
So that sort of works. Next I should add an array_indexes() and array_values(). Oh, and happy new year :D
Edit:
Added array_pop function.

Thursday, 29 December 2011

A different 2D array in bash

I have recently been writing a pretend Object Oriented Programming layer on top of bash. Basically, it uses some global vars, function-within-functions and a ton of eval to create the illusion of OOP in bash.

Something like:

#!/usr/lib/oobash.sh

# Import the Car class
`import Car`

# Create a new "object"
new Car as mycar

# Set a variable
mycar.speed 100

# Call a function
mycar.drive

This will be made available here http://code.google.com/p/object-oriented-bash/ soon, but in the meantime I have needed to implement a 2D array in bash. A quick Google search shows some different ways of doing this. Firstly is to dynamically create arrays using eval. The other approach is to use a huge offset, e.g. 1000, then refer to array[i][j] as array[i*1000+j] etc. The latter example requires you to keep track of two indexes which is sort of annoying.

Edit:

I noticed someone has already done something similar at http://oobash.sourceforge.net/. I'l take a look at it some point and if it does the job I'll abandon my project.

So, in an attempt to find something different, I came up with the following. The idea is to store some metadata about how many elements we're adding.

An array with just one element might look like this:

[0] = 1
[1] = "Hello there"

The element at index 0 is the number of elements that follow. As the value is 1, the next element contains the value, which would basically be a scalar. Now, an array might look like this:

[0] = 2
[1] = "Hello"
[2] = "there"

Element 0 tells us to expect 2 values. Nice and simple huh? A more complete example:

[0] = 1
[1] = "Hello"
[2] = 3
[3] = "How"
[4] = "are"
[5] = "you?"

You get the idea. But, we have a problem. With arrays, one often wants to pop from the end of the array. In order to do that, we have to start at the end and work backwards. But the following example would break that:

[0] = 1
[1] = "Hi"
[2] = 3
[3] = "aaa"
[4] = 1
[5] = "bbb"

If we worked backwards from 5, we would get to element 4, which looks like an array count and conclude that (4,5) was a scalar pair. That isn't actually the case. But, there is a solution.

[0] = 1
[1] = "Hi"
[2] = 1
[3] = 3
[4] = "aaa"
[5] = 1
[6] = "bbb"
[7] = 3

At the cost of some space, we've added a count element at the end of each array. That way we can go to the last element, and know how far backwards to go.

Well, that's the theory. Now lets look at a push:

#!/bin/bash

# Function: array_push()
# Params:
#   name     - name of array receiving data
#   indexVar - name of variable to store index used
#   ...      - list of one or more values to be pushed
# Returns:
#   Nothing really
array_push()
{
    # Require at least 3 parameters
    if [ "${#}" -lt "3" ]; then
        return
    else
        # Set parameter variables
        local name="$1"; shift
        local indexVar="$1"; shift

        # Get the next index
        eval local index=\$\{\#$name\[\@\]\}

        # Make a note of how many elements we're pushing
        local count="${#}"

        # Set the index var to our index. This can then be used later 
        # to retrieve data from the array.
        eval $indexVar=$index

        # Record the count
        eval $name\[$index\]=$count

        # Moving up in the array
        ((index++))

        # We now iterate over the values, adding them to the array
        end=$((index+count))
        while [ "$index" -lt "$end" ]; do
            eval $name\[$index\]=\"$1\"
            shift
            ((index++))
        done

        # Finally, make another record of the count
        eval $name\[$index\]=$count
    fi
}

declare -a aTest

echo First push
array_push aTest iIndexA "Good day."

echo "index: $iIndexA"
echo new array is ${aTest[@]}
echo

echo Second push
array_push aTest iIndexB "To" "you" "sir!"

echo "index: $iIndexB"
echo new array is ${aTest[@]}

Running it gives us:

$ ./push.sh 
First push
index: 0
new array is 1 Good day. 1

Second push
index: 3
new array is 1 Good day. 1 3 To you sir! 3
$

Wonderful. We're not doing anything useful, but the data is there. It was at this point that I got bored of this approach. The problem is that doesn't extend to beyond two dimensions. The only way around this would be some sort of JSON like beginning/end tags instead of a count. E.g.

[0] = {
[1] = 'a'
[2] = }
[3] = {
[4] = 'b'
[5] = 'c'
[6] = {
[7] = 'd'
[8] = }
[9] = }

This is could get a bit slow to process. We'd have to do a lot of string comparisons, and we'd need to quote values to prevent confusion.

Alternatively, we could use an array as usual, but have our functions automatically test values to see if they're an array. That blog post shall follow shortly.