Cyber Security

a discussion about better ways of doing things

System number maps

In a recent post (Subfile Lookup Tables) I mentioned system number maps. Let’s talk a little more about these.



Each node in the nodal structure is assigned a unique number known as its system number. This is an integer value which is quite separate from its nodal number, such as 1.2.3, and is usually assigned in the order in which it is created. The first node to be created would have system number 1, the second would have system number 2, and so on. The reason for system numbers is that nodes can be moved around in the structure, so their nodal numbers can change. System numbers, in contrast, never change, no matter where the node is located in the structure. If you think of nodal numbers as street addresses, system numbers would be the equivalent of social insurance numbers, or whatever is the equivalent government identification number in your country. If you move from one house to another, your street address will change, but not your government ID number.



Referring back to the previous post, let’s assume that node 1.3 has been exported as subfile 2, and suppose its system number in the master file is 8. Subfile 2 is then expanded by its owner, and in due course is imported back into the master file. In the interim, let’s suppose that the master file owner has added a new node to the 1.1/1.2/1.3 sibling group, and decided that it should go immediately after node 1.2. This means that the new node is now 1.3, and the original 1.3 has become 1.4:

Fig 1

If subfile 2 were imported back into node 1.3, there would be a major mixup. However, in reality subfile 2 is imported back into the node with system number 8, which is now identified with node 1.4, so the subfile will be imported back into the correct place.


Let’s look at what happens to subfile 2 before it is imported back again. It’s owner decides to add three new nodes, which will be 1.3.1, 1.3.2 and 1.3.3:


These are all given appropriate system numbers. The original top-level node 1.3 becomes system number 1 in the subfile, and the new nodes become system numbers 2, 3 and 4 respectively. It doesn’t matter that system numbers in the subfile don’t correspond to system numbers in the parent file, because we are going to create a map translating one to the other.


When the subfile is first created, the system number map will consist of a single entry: system number 1 in the subfile relates to system number 8 in the parent file. We then add three more nodes in the subfile. At this point they have no equivalent numbers in the parent file because the subfile has not yet been imported back into the parent. However, when import occurs, these new nodes are given the first three available system numbers in the parent file – let’s say these are 45, 46 and 47. The system number map after import will then record that system numbers 2, 3 and 4 in the subfile correspond to 45, 46 and 47 in the parent file.


Fig 2-2System number maps are unnecessary if you are only going to import the subfile once, because in such cases you can presumably throw the subfile away once it has been imported. However, if the subfile is going to have an ongoing existence, and will be imported back into the parent multiple times, as and when new data becomes available in the subfile, then you need to know which nodes in the subfile correspond to which nodes in the parent file, and a system number map is required. The map will be a living document, because it will need to be updated whenever any additions are made to the subfile nodal structure.



You will have noticed that I’ve been talking about importing into the parent file, rather than importing into the master file. A parent file is simply the file from which a subfile was originally derived, and may be the master file or may be a subfile itself. System number mapping applies at all levels in the subfile structure.



Suppose that subfile 2 as described above is imported back into its parent, and then at a later date the subfile owner decides that node 1.3.3 is not required, and deletes it. This poses a problem when the subfile is next imported into its parent, because while node 1.3.3 will exist in the parent (because it was placed there on a previous import), it no longer exists in the subfile. We get around this difficulty by negating numbers in the system number map which have been deleted in the subfile so that there is a permanent record of the deletion, and allowance can be made for it on any future imports.


There is no provision for the opposite case, where the same node is deleted in the parent file, because the deleted node will simply be reinstated when the subfile is next imported. This implies that data flow is always one-way, i.e. from subfile to parent. You can design systems that permit two-way flow, but they are a whole lot more complicated, so you don’t use them unless you really need to.

Creating subfile families

I had meant to post this a long time ago, but have been involved in other blogs and writings. My apologies. I see from some of the comments that many of you out there are interested in what I’m saying, but need some more details. OK, here goes.

This post is about setting up a master file and creating a family of subfiles. In the multi-level system of subfiles that I described in earlier posts, there must be a master file at the top of the hierarchy from which everything else is derived. I will assume for the sake of argument that the master file is based on a hierarchical work breakdown structure, such as the one shown:

Fig 1

You can have other types of data arrangements besides hierarchical, but the hierarchical structure is the most common. I’ll repeat what I said in an earlier post that this represents a work structure, not a database structure. Most databases nowadays are based on relational structures, but I’m not concerned with that here.

Let’s suppose now that node 3.0 is hived off as an independent subfile. Since this is the first subfile to be created, it becomes subfile 1. Next, node 1.3 might be hived off, and it becomes subfile 2, then node 1.2.1 might be hived off as subfile 3. You will see that subfile numbers refer solely to the order in which they are created, and have nothing to do with the number of the node which is hived off to create them.

Subfile owners are at liberty to expand the nodal structures of their subfiles, and create new subfiles. Let’s suppose that the owner of subfile 2 adds new nodes, 1.3.1 and 1.3.2 to the subfile, and hives each of them off as level 2 subfiles. These now become subfiles 2,1 and 2,2 (we use commas as separators to avoid confusion with node numbers such as 2.1 and 2.2). Let’s then suppose that the owner of subfile 2,1 expands the structure in that subfile and subsequently hives off four level 3 subfiles, known as subfiles 2,1,1,  2,1,2,  2,1,3 and 2,1,4 respectively. This whole sequence is shown below:

Fig 2
We now have two separate, and completely independent, hierarchical structures: the original nodal work breakdown structure and the subfile structure. The subfile structure is always hierarchical because of the way it is derived.



In order for the whole sequence to begin, a master file must be created. The master file must encompass the entire nodal structure, albeit in a very abbreviated form, such as:

Fig 3
In addition, the master file must have a unique recognition code that can be copied to all subfiles derived from it so that those subfiles can be recognized as part of that subfile family. This then is the minimum requirement for a master file – a top-level nodal structure and a recognition code. Obviously you can, and usually do, put more into a master file than this, such as common data fields, but you don’t have to.



Subfile owners are at liberty to expand the nodal structure in their subfiles. For example, as mentioned above, the owner of subfile 2, which was based on node 1.3, added nodes 1.3.1 and 1.3.2 to the subfile structure. These new nodes would be added to the master file when the subfile was imported back into the master file. However, while subfiles can add descendant nodes, they are not allowed add parallel nodes, so, for example, subfile 2 can’t add a node 1.4. This prevents a subfile from creating nodal structures which could interfere with work going on in other subfiles.

One restriction which is usually implemented in subfile systems is that you are not allowed to hive off a node as a new subfile if an ancestor or descendant of that node has previously been hived off. In the example given here, node 1.3 has been hived off as subfile 2. If the master file owner then decides to hive off node 1.0 as a subfile, the owner of this new subfile would be at liberty to create a new 1.3 node. This means that there would be two independent 1.3 nodes floating around out there, which is probably not a good idea. The same problem would occur if a descendant of node 1.3, such as 1.3.1, is hived off from the master file as a new subfile.

Subfile lookup tables

I’ve been involved in some other business for the past few weeks and haven’t had time to write anything in this blog. OK, back to work. I want to talk about subfile lookup tables, generally known as subfile LUTs.

Whenever a subfile is created, a set of data is created at the same time that controls how it will eventually be imported back into the parent. This is the subfile LUT. Think of it as a wiring diagram that specifies which parts of the subfile connect to which parts of the parent file. The subfile LUT must track any changes made to the subfile, so subfile LUTs are living documents that evolve as the subfile evolves.

By the way, as originally conceived, a subfile LUT was a single data map converting references in the subfile to references in the parent file. However, as the whole concept of subfiles and export/import was developed, more and more data got added to the subfile LUT until it has become a collection of lookup tables and other data. We just haven’t got around to finding a suitable new name for it.

There will need to be a subfile LUT not only for every subfile, but for every import transition that the subfile makes. Every subfile will need a LUT for import back to its parent, but if that subfile is ever imported into a more distant ancestor, there will need to be a LUT created for that import as well.



There can be many subfile LUTs stored in the system, so we are going to need a method of identifying them. We do this using subfile numbers and destination levels.

Subfile Number

Each subfile is assigned a serial number relative to its parent (the first subfile exported is no. 1, the second is no. 2, and so on). Since the parent, if it is also a subfile, will have its own serial number relative to its parent, any subfile at any level is uniquely characterized by a chain of serial numbers which details its ancestry back to the master file. So, as I mentioned in an earlier post, a level 3 subfile might, for example, be denoted as (2,1,4), indicating that it is the 4th subfile of the 1st subfile of the 2nd subfile of the master file. This series of numbers is known as the subfile number and is stored as part of the subfile LUT, where it identifies the subfile to which that LUT refers.

Destination Level

A subfile is normally imported into its immediate parent, but may also be imported into a more distant ancestor. Since the one-to-many hierarchical structure of the subfile constellation ensures that any subfile can only have a single ancestor at any subfile level, it is only necessary to specify that level in order to identify the ancestor unambiguously. For example, the second-level subfile which is the parent of subfile 2,1,4 can only be subfile 2,1, and the first-level subfile which is its grandparent can only be subfile 2. The subfile level of the parent or ancestor to which the various maps relate is known as the destination level, and is stored as part of the LUT.



The primary purpose of a subfile LUT is to map data from the subfile to its import destination. Two types of data need to be considered, structural data and common data.

Structural Data

One of the problems with creating independent subfiles is that it is possible for the structure of the parent file to be modified after the subfile has been created. For example, if node 2.1 was exported as a subfile, but the parent file owner decides afterwards that node 2.1 should really be node 4.4.3, there is going to be a problem when the subfile is imported back into its parent.

We get around this problem by attaching a unique number, called a system number, to each node. System numbers never change, even if the node is moved to a different place in the structure. Node numbers such as 2.1 or 4.4.3 are now just addresses which tell you where a particular node is currently located in the structure. Think of node numbers as cubicle numbers in a large office – they just tell people where to find you. However, your real identity, as far as the organization is concerned, is your payroll number, which never changes no matter where you are physically located in the office.

System Number Map

System numbers in subfiles probably won’t be the same as their counterparts in the parent file. Suppose we have exported node 2.1 as a subfile, and let’s suppose this node has system number 8 in the parent file. We then add three descendants to this node in the subfile – 2.1.1, 2.1.2 and 2.1.3. Let’s suppose these are given system numbers 9, 10 and 11 in the subfile. However, when the subfile is imported back into its parent, the numbers 9, 10 and 11 might already be taken by nodes imported from another subfile, or by new nodes created directly in the parent. Consequently, the three new nodes will have to be given other system numbers – let’s say 31, 32 and 33. The system number map will then record the fact that node 9 in the subfile is node 31 in the parent, 10 is 32, and so on.

The map isn’t necessary if the import process occurs once and once only. It is only necessary if the subfile is imported a second or further time for update purposes, because then you need to know which nodes in the subfile correspond to which nodes in the parent.

Common Data Maps

Common data is non-structural data which can be applied to more than one node. For example, in a costing application, the number of labor hours to perform a particular task would be specific to a particular node, but the rate per hour for that labor would be common data since it could presumably be used in many nodes. Common data items are assigned field numbers for reference purposes. Field numbers are analogous to system numbers in the structural data, and are subject to the same requirements for mapping and tracing upon import.

When a subfile is first exported from its parent file, its common data will have the same field numbers as its parent, because the subfile normally inherits the entire common data set of the parent without change. If the subfile subsequently generates new common data items, these will be assigned their own field numbers in the subfile. When the subfile is imported back into its parent file, the new items will be imported into the parent where they must be given appropriate field numbers. However, the field numbers used in the subfile may already be occupied in the parent by imports from other subfiles or by new items created directly in the parent file, so these new subfile data items may have to be assigned different field numbers in the parent.

In the same way as for node system numbers, this difference in field numbers won’t matter if the subfile is imported only once. However, if the subfile is imported a second or further time, it will be necessary to have a lookup table which maps field numbers between subfile and parent. There will usually be a separate map for each distinct type of common data.



If a subfile is imported into a more distant ancestor for update purposes, it will be necessary to establish the correspondence between subfile and ancestor system numbers and common data field numbers. This is done by tracing these numbers up the chain of maps for each single-level import between the subfile and its ancestor. In order for this to be possible, all the single-level imports between subfile and ancestor must have already occurred so that data maps are available at each level. Thus for example, if a level 3 subfile is to be imported into the level 0 master file, the corresponding 3/2 import (i.e. level 3 subfile imported into its level 2 parent), 2/1 import and 1/0 import must all have previously occurred.

Whereas subfile-to-parent LUTs are created when the subfile itself is created, subfile-to-ancestor LUTs are created only when a multi-level import takes place.


Multi-generational families of data packets

In a previous post I mentioned the possibility of data packets ‘hiving off’ their own data packets, and so ad infinitum. In this way you can build up a multi-generational family of data packets.

Let’s call the data packet at the top, i.e. the one from which everything else is derived, the master file. Data packets derived directly from this would be level 1 subfiles, data packets derived from level 1 subfiles would be level 2 subfiles, and so on. You can set up a numbering system for subfiles similar to decimal numbers used in, say, work breakdown structures. For example, a level 3 subfile might be denoted as (2,1,4), indicating that it’s the 4th subfile of the 1st subfile of the 2nd subfile of the master file (I’m using commas rather than periods as separators to avoid confusion with decimal numbers).

The decision to create a low-level subfile doesn’t have to be planned in advance at the master file level, but can be made at the immediate parent level. For example, the owner of a level 2 subfile can decide on his/her own to create any number of level 3 subfiles. You can, of course, set things up so that the subfile family structure is determined from the get-go, but you don’t have to.

Each subfile in a multi-generational family would share the same recognition code as the master file. Consequently, only subfiles derived from the original master file could be imported back into a higher level file, and by comparing the subfile numbers, such as (2,1,4) you can ensure that subfiles can only be imported back into their immediate parents, or their ancestors. A multi-generational family is therefore a closed system which is almost impossible to break into.

It’s possible in a multi-level family to import a low-level subfile across more than one generation, provided that all the intermediate imports have occurred previously. For example, if a level 3 subfile, say (2,1,4), has developed some information which is urgently needed in the master file, you can import it directly into the master file (known as a 3/0 import, i.e. level 3 into level 0) provided the intermediate 3/2, 2/1 and 1/0 import sequence has already occurred. (If you tried the 3/0 import without the intermediate sequence being in place, the master file may not ‘know’ of the existence of the level 3 file, and have no place for its data to go.)

Multi-generational importation brings with it some problems, such as prior importation. This can occur, for example, when a new data field is created in a low level subfile, and is then imported directly into a high-level ancestor. If the low-level subfile is then subsequently imported into its immediate parent, and so on up the chain, the new field can reach the high-level ancestor by two different routes, so you could end up with the same field being treated as two different fields. More on this later.

Trying to spoof the system

Let’s suppose that someone wants to break into a system which is based on the large data packet/integral recognition code architecture that I described in previous posts. Let’s look at how this could be accomplished.

In order to gain entry into such a system you have to have a data packet with a valid recognition code, because the system will be configured such that it won’t accept anything else. Assuming that the packet is going to be fabricated ab initio, you need to know two things – first the recognition code, and second the way it is encoded in the data packet.

Recognition codes can be long and random, because there is no requirement for them to be memorized. Encoding can be as complex as you like, because decoding is purely a computer-to-computer operation (no human involvement). Consequently, the only remotely feasible way of spoofing the system is to get hold of a bona fide data packet and (presumably) insert your own worm inside it.

Getting hold of a bona fide data packet probably means getting access to the hardware on which it is stored. While this is not impossible, it’s a whole lot harder than simply discovering someone’s password. We can assume that a mobile device will be protected in some way, either biometrically or by a password. If the latter, it can be a strong password because the device owner will only need to remember that one password, not a whole raft of them for different websites.

Let’s suppose however that you have got hold of a bona fide data packet and have managed to insert a worm into it. The next hurdle you have to face is that large data packets are only transmitted on an occasional basis – say once a day – and the transmission process is not expected to be in real time. This will give plenty of opportunity for exhaustively examining the data to see if there is anything there that shouldn’t be there. Again, it’s not impossible for a cleverly concealed worm to escape detection, but it just makes it that much more difficult.

Nothing I’ve described so far represents an absolute barrier to an intruder. However, it does make things much more difficult, which means that the probability of an intruder getting in is that much less. As I said in an earlier post, we may not be breaking the intruder/cyber security cycle, but at least we can give it a flat tire.

Exporting and importing subfiles

In my last post I talked about using large data packets, called subfiles. In this post I want to discuss how subfiles are created in the first place (exported) and re-integrated back into the parent system (imported).

Let’s assume that the data structure from which the subfile is to be created is capable of being subdivided into independent sections. Typically, this would involve a hierarchical structure. Yes, I know that if you are a database expert, hierarchical structures are passé, but I’m talking about work structures, not databases. Work structures are almost always hierarchical.

A typical work structure consists of a series of nodes linked in parent/child/sibling relationships. Each node can potentially be a data container. A subfile will encompass a single node (which may or may not already have some descendant nodes attached to it) plus any data which those nodes already contain. Additionally, there will probably be common data which can be associated with multiple nodes.

Using the example of the sales rep in my last post, his node might be ‘North-Eastern division, industrial plastics sales’. He may have several subnodes for different customers or perhaps for different products, however the data is organized. Each node may contain data about sales orders. In addition there will be common data, such as unit prices for various products. A subfile will be ‘hived off’ (exported) containing all this data, plus a unique recognition code which enables the parent system to recognize the subfile when it is imported back into the parent system. Since this code does not need to be memorized by the sales rep, it can be as long and as complicated as you like. It’s encoded so that it can only be read by the parent file.

In the field the sales rep can populate the nodes of his data with sales orders. He may also add new nodes for new customers, or he may change some of the unit prices if he has authority to negotiate prices, or change some of the terms and conditions of sales.

When the subfile and its data are imported back into the parent system, there must necessarily be a map telling the parent system which nodes in the subfile go where, and which elements of common data in the subfile are associated with which elements in the parent file. We call this map a subfile lookup table, or subfile LUT. It’s created when the subfile is first exported and updated every time the subfile is imported back into the parent file.

The subfile LUT is central to the whole concept of independent subfiles. More on this in later posts.

Built-in passwords

Like many of us, I have usernames and passwords for a whole raft of different websites. Many of these I access rarely, some of them once only. Provided I have never given a credit card number on these sites I usually use my ‘standard’ password, which is one I can easily remember, because I could care less if some smartass decides to impersonate me by logging on in my name.

For some websites, however, I do care if someone discovers my password – my bank account, for one. That particular password is a strong one, and I take care that it isn’t written down anywhere.  I have a small number of strong passwords for important functions in my life, and that just about exhausts my capability to remember them. I could, I suppose, use a password manager, but some day someone is going to come up with a way to hack into password managers, so I don’t feel too happy about using them.

We’ve been experimenting with an alternative method of protecting data. The essence of the method is this: instead of frequently transferring small packets of data on demand to real-time applications, and receiving packets back, also in real time, we transfer much larger packets on an occasional basis, and, if necessary, receive updated packets back again, also on an occasional basis. Each packet has built into it a long code sequence which identifies it to the receiving system, so it is virtually impossible for a spurious packet to get in. In addition, since packets are transferred in non-real time (i.e. you are not demanding an immediate response when you transfer them), there is no real-time constraint on processing received packets, so you can subject those packets to a much greater level of scrutiny than would be the case in real-time mode.

If you can arrange your business model so that all data traffic is in the form of these large packets, it will be very difficult for anyone to break into your system.

Let’s take a simple example to illustrate this. Company X has its reps in the field with mobile devices, providing quotes to potential customers and recording orders. Every transaction on the mobile will normally require access to a central database, and the rep will expect a near real-time response. Any orders booked will be recorded immediately in the central database. Consequently, there must be an open channel between the mobile device and the central database. Now suppose we download that part of the central database which is relevant to the rep (product descriptions and prices, say) into the rep’s mobile device. Any orders booked will be stored on the mobile device, and once a day any new information can be uploaded to the central database. As a result, we can close down that open channel for most of the time.

In order for this to work, the remote application must contain all the processing capabilities needed to deal with data, and each packet must typically contain a fair-sized database of information for that processing to be possible.

While this works in the simple case described above, what happens when the remote application is able to make changes to the structure and content of the central database, such as adding new customers, rather than just filling in blanks in pre-defined forms? This is where the technology we’ve developed comes in. More of this in a later post, but you if want to read a technical paper on it, you can Google its title: Cooperative Data Manipulation in a Low-Connectivity Environment.

Another aspect of this technology is its multi-generational capability. If you create a data packet (we call them subordinate files, or subfiles for short) and give it to one of your agents, that agent can split off his own subfiles and give them to someone else. A hierarchical structure of subfiles can be built up in this way. More on this in a later post.

Passwords – do we really need to memorise them?

At the back of everyone’s mind (and quite often at the front) is the fear of being hacked, of having unfriendly people rummaging around in your sensitive data. The sad fact is that, as soon as your computers are connected to a telecommunications line, you are vulnerable to this kind of attack. No matter how good your defences, if a sufficiently motivated and technically competent intruder decides to get into your system, chances are they will succeed.

The weak point in your system is the telecommunications access. If you can get in, so can they. While there are many ways of unlicensed access to a system for a competent hacker, getting in is easy if you have obtained a username and a password.

The problem is that most people aren’t very good at remembering passwords which are long, random character sequences (I know I’m not). If you leave people to set up their own passwords, chances are they will use something like ‘qwerty’ or ‘12345’. Conversely, if you assign them random sequences like ‘k*5Gsp/4Q%’, which you change every three months anyway, they often write them down somewhere easily accessible (such as a post-it note stuck to their monitor – don’t laugh, I’ve seen it), which rather defeats the purpose of having them.

Instead of saying that people must learn to remember long, random passwords, which we know people aren’t very good at, we should really ask ourselves what people are good at. One of the things that people usually do quite well is looking after their personal property. Very few people lose their smartphones. OK, you hear about it from time to time, but the reason you hear about it is that it isn’t very common.

Another point worth considering is that, to the best of my knowledge, no-one has ever hacked into a space satellite. Sending signals to a satellite is easy – all you need is a big dish pointing at the sky. However, getting the satellite to take any notice of those signals is a lot more difficult, because satellite command and control systems are protected by long, complex code sequences. The reason these can be long and complex is that they are built into the control equipment, so no-one needs to remember them. The only way to get at these code sequences is to break into the control room, and these usually have darn good physical security.

Now put these two ideas together – can we arrange things so that the passwords necessary for access are built into the data itself? Passwords could then be as long and as complicated as you like, because there would be no need for people to remember them. Access to the passwords would require physical access to a terminal, and people tend to be fairly good at protecting such access, whether it be a desktop terminal in an office or a mobile device.

There are of course a number of other problems that need to be sorted out before this becomes a practical way of doing things. More on this in my next blog.

What’s it all about?

This blog is all about cyber security, and in particular whether there are better ways of protecting data than are currently in use. Comments are welcome.

During World War II a couple of salesmen were waiting in an anteroom in the Pentagon in Washington and started talking to each other. The first one said he was from a company that manufactured armor-piercing shells. “Everything was working well, then we get a call from the Pentagon to say that our shells aren’t penetrating the standard armor plate they use for test purposes. So we make an improved shell, and that works for a month or two, until we get another call saying the shells are failing again. I just don’t understand it.”

The other salesman said “That’s funny, my company makes armor plating, and we’ve got the same problem. We got a call saying the standard armor-piercing shell they use for testing was going through our armor-plating, so we made an improved version, which worked for two or three months until we got a call saying the standard shell was defeating it again. I just don’t understand it either.”

Cyber security is a bit like this. A security threat arises, whether we’re talking about viruses, botnets, Trojan horses, spyware or whatever, whether the perpetrators are nerds in basements, criminal organizations or nation states, and the software industry responds. This works for a while until a new and improved threat arises, then the cycle begins all over again. Is there a better way of doing things? Can the cycle be, if not broken, at least given a flat tire?

I’ve got some ideas on how this might be done. It’s not a complete solution, but it might lead to better ways of doing things. I would welcome inputs from others because I think a cooperative effort can sometimes produce interesting new insights. And, oh yes, I know that this blog will probably be read by hackers as much as security experts. I don’t think it matters because the object is to find a method of cyber security that works in spite of anything a hacker can do.



There are many reasons for hacking into a computer system. At one extreme there’s the personal hygiene-challenged nerd in his parent’s basement, doing it simply because he can, with a dash of the-world-doesn’t-love-me thrown in. There’s the hacktivist who wants to make a political point, trying to embarrass the government or an oil company or whoever it happens to be fashionable to deplore this week. There are cyber warriors whose purpose is to cause harm to those they hack. The Stuxnet worm was a good example of this, reportedly destroying some of the gas centrifuges used in Iran’s uranium enrichment program. (Interestingly enough, the computer network that controlled these machines wasn’t connected to the outside world, so there must have been some James Bond-type activities to get the worm into the system.) And then there are those that want to get hold of someone else’s data for financial gain.

I’m not going to talk about cyber warriors, at least not too much, because they are usually agents of nation states with access to all the resources of a nation state. They tend to specialize in what are described as advanced, persistent threats (APTs), in which they spend a long time insinuating themselves into a system and setting up a virtual camp there. They are not so much burglars as Special Forces troops sent out to infiltrate into the enemy’s stronghold.

My main interest at this time is in the burglars of the hacking world, who break into your data systems on a one-time basis. Why they do it isn’t as important as the fact that they usually have a common purpose, which is to gain access to data to which they have no right.



My grandfather lived in a village in Ireland, where he taught in a one-horse school. Nobody ever locked their doors, because the idea of anyone breaking into your house was simply unthinkable. Certainly you had nothing to fear from any of your neighbors, and any stranger coming into the village would be noticed instantly. Then my mother went off to live in the big city, and sure enough, people locked their doors there.

Burglary is possible in a big city because it provides anonymity. If you break into your neighbor’s house in a small village, you will almost certainly be found out. If you burgle a house in a city and aren’t too clumsy, there’s a good chance of getting away with it because you are just one more face in the crowd.

Before the internet, say 20 or 25 years ago, computers tended to form a series of villages. While a large company or a government department might have a considerable network of computers, they were not connected to the outside world, and the only way to break into the system was to gain physical access to a terminal. Then the internet came along, everyone’s computer suddenly became part of a world-wide network, and all those little villages coalesced into a gigantic metropolis. Burglary – because that’s what hacking is, data burglary – suddenly became possible.

When a person from a small village goes to live in the big city, they are often easy marks for swindlers and assorted crooks until they get wise to the ways of the big city. They tend to be too trusting, and lacking in self-protective cynicism. I think the problem we have today is that many of us still have small village mentalities with regard to our data and haven’t yet got accustomed to living in a big city.



There was no bank in my grandfather’s village, and there weren’t any credit or debit cards in his day either, so he kept most of his money in his house. Everybody paid cash for everything, everybody had cash in their houses, nobody locked their doors – and crime was almost non-existent. But transfer this model to the big city and you have problems. At first you can get round it by locking your doors, but burglars will find a way of getting around those locks. Eventually, of course, we morph into a cashless society, first with plastic debit cards and then with virtual cards on our smart phones, which makes life a lot more difficult for burglars.

I think one of our problems is that, as a data-owning society, we’ve moved to the big city, we’ve learnt to lock our doors, but we haven’t yet taken the next step in changing the way we do business. We’re still walking around with lots of cash in our pockets, so to speak, and haven’t yet grasped the concept of a cashless society. More on this in my next post.