ContributionsTech

Is shipping getting big data wrong?

Pierre Aury returns to a familiar topic for his latest column. 

In this column, we will again look at these new cloud-based maritime solutions rather like I did last month. There is not one advertisement from these vendors that doesn’t include a mention regarding big data. One of the vendors for example is claiming that the proposed solution is processing 1bn AIS signals a day. The comfort of big numbers validating decisions is hard to resist. But let us first look at what big data means.

Typically the big data expression refers to data sets that contain a greater variety of items in ever-increasing volumes and coming in with increasing velocity. This is known as the three Vs. Practically big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing methods just can’t manage them. It is claimed that these massive volumes of data can be used to generate business ideas and/or to address business problems that were not possible to tackle before. 

So going back to the claim that this system, or another, is processing 1bn AIS signals a day it is indeed ticking the new data sources as AIS still qualifies as being relatively new. But that is the only box that is being ticked from an end-user point of view. 1bn signals a day include all ships above a certain size so it is comprised of AIS signals sent by trawlers, tugs, heavylift carriers, cattle carriers, LNG tankers, cruise vessels, dry bulkers above 300 gross tons. 

For a capesize vessel owner the AIS data obtained from cattle carriers or cruise vessels are as useful as a chocolate fireguard unless of course, this owner has embarked in research to prove that ships tend to be found at sea more than inland. Our capesize owner is only interested in AIS data from dry bulk vessels which probably brings down the size of the data set from 1bn to 100m a day for, in number of ships, the dry bulk fleet is about 10% of the world fleet. 

100m would still be a big number to deal with except for two things which are frequency and contingency. The frequency of updates of AIS data is very high and is driven by technology. It is way too high considering the speed at which ships are sailing. For most uses, one AIS noon position signal should be sufficient. The other point that deflates the big data claim is the fact that AIS data is not random and in fact is highly predictable. A capesize vessel is not in the Atlantic day one, in the Pacific day two then back in the Atlantic day three. Most of this data is kind of path-dependent.   

Another claim which is made by these systems is about the real-time insights they give which has to be put in perspective with the slow speed of ships and the very low frequency of trading. 

Now AIS signals and other shipping-related data must be studied (to be a student of the market is a condition to achieve success in trading) but we, shipping people, must never forget that shipping is not an adiabatic system: we are merely a derivative of the world economy/geopolitical situation which is affected by decisions of lunatic politicians that no AIS signal is going to predict.

Finally having experienced the dot-com bubble this column cannot end without mentioning a sense of déjà vu. Is losing money again a plus in the digital tech world? It seems that a higher cash burn combined with lower billings is again attracting investors.

Pierre Aury

Starting out as a cadet with Louis Dreyfus in 1977, Pierre’s shipping career has taken him across the world including stints in Sydney, Istanbul, London and Paris, working for Clarksons, Enron and Platou along the way. Among a host of roles as a shipping consultant for the past eight years, Pierre heads up Competitive ShipBrokers, an association with 14 famous brokers as members.

Comments

  1. i tend to agree to Pierre’s observatins on big data ; it pointless to deal with big data without having a goal to achieve and irrespective of the three V’s variety velocity and volume ; besides each of these metrics need to be dealt with separately at the first instance and then combined to perform a performance observation study and then it needs to be done daily or even instantaneously –so imagine and estimate the kind of computing power required and to what effect–to arrive at the same conclusions which one would arrive by using first principles ? –i think its a waste of time and money

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button