From 2019 to 2021, I was a master student in Applied Computer Science student at the Norwegian University of Technology and Science (NTNU). Throughout my studies, I had the opportunity to work as a part time developer at Maritime Optima. For my final dissertation, I chose to collaborate with MO and dive into the topic of vessel destination predictions, investigating the possibility of applying Machine Learning (ML) techniques to the problem area.
The thesis was finished in June 2021, and it was titled “Vessel destination forecasting based on historical AIS data”. In this blog post, I will try to explain some of the process I used to construct voyages from historical AIS data, build a ML training dataset, and train a model to predict vessels’ next destinations.
The shipping industry is a vast and complex trading system that has an extensive impact on the global economy. It accounts for approximately 90 % of all world trade. Interested parties such as brokers, charterers, and investors are all continuously searching for accurate information that can help them understand the future ebbs and flows of this volatile market. Thus, being able to effectively predict future movements and the availability of shipping vessels can be essential for many of the people involved in the industry. The market is generally defined by supply and demand where, in this case, demand consists of available cargoes to be shipped, and supply consists of vessels available to ship the cargoes. Therefore, being able to effectively forecast the future destinations of shipping vessels is key in decision making processes.
In recent years, numerous software products have been developed that aim to assist maritime companies in their decision making processes. Many of these software products are based on the availability of Automated Identification System (AIS) data. AIS has become a globally adopted standard enforced by the International Maritime Organization (IMO) since 2006 for safety and navigation reasons. However, since AIS transmitters emit all commercial vessels’ navigational data, it also has commercial value in that it provides a global overview of shipping vessels’ movements over time. Recent studies into historical AIS data further elaborates that it is indeed applicable toward predicting future trajectories and movements of vessels and that Machine Learning (ML) techniques can be applied to this topic area.
The topics covered by my project mainly include applying computer science techniques to the problem of predicting shipping vessels’ future destinations and voyage patterns to assist various companies in the shipping industry in their daily decision making processes. More specifically, the thesis focuses on the aspect of applying ML techniques to vessel destination prediction using different sources of vessel information such as AIS, voyage patterns, and individual vessel information such as vessel types, or segments.
Maritime Optima provided me with the initial data foundation I used throughout my thesis. This data consisted of a historical AIS dataset containing more than a billion positional records from over 60 000 vessels ranging in time from December 2019 to March 2021. The below image shows a visualization of 200 million of these AIS positional records to show the vastness and coverage of the historical AIS dataset.
MO also provided me with their collection of vessel descriptions, mainly all vessel’s segment and sub-segment values. The figure below is from an image of MO’s web platform, showing how different sub-segments of the dry bulk cargo segment travel in different areas of the world. Since this categorization provides valuable insights into voyage patterns, vessel segmentation is included in this thesis’s proposed approach to vessel destination prediction.
Lastly, MO has an extensive port database consisting of more than 5 600 shipping ports that have been manually validated to ensure that they are commercial shipping ports. This was another important part of the initial data foundation I used throughout the thesis.
In order to effectively predict a vessel’s future destination, or analyze voyage patterns in general, a vessel voyage must first be defined. A definition is needed to be able to construct voyages from AIS data and affects the outcome of any prediction method. There might be several different reasons for a vessel to visit a port, not all of which means that the port was the vessel’s final stop in a voyage.
Larger vessels traveling long distances often have to bunker (refuel) at bunker ports between the port they loaded cargo at and the port they eventually will unload the cargo at. In some cases, vessels anchor outside of such bunker ports awaiting to be refueled by bunker vessels, while in other cases they can reduce their speed and be refueled without ever stopping completely. Another common reason for vessels to physically stop moving is congestion in ports. Very often vessels of any size have to wait their turn before loading or unloading at busy ports.
For the purpose of the thesis, an arrival was defined only when the vessel herself claims to be moored by reflecting this as a navigational status in the AIS data. With this definition of an arrival event, we can construct voyages based on AIS positions transmitted between two arrivals. As an example, in the image below, a vessel’s navigational status “transitioned” between the value “moored” and “under way using engine”. When the vessel arrived at her final destination, the navigational status was changed back to moored. Connecting the AIS positional records between these two events in time gives us the vessel’s trajectory.
It is worth noting that there are a few alternative approaches to this problem proposed in related research. The most promising of which revolved around detecting dense clusters of AIS records transmitted close to known shipping ports. A vessel’s voyage, or trajectory, could be defined as positions transmitted between two subsequent clusters detected nearby two different ports. I investigated this specific approach in the thesis, however, I found that the main drawback of this approach is the lack of context of why the vessel is stopping in close proximity to a port. As mentioned in the beginning of this section, vessels might have various reasons for visiting ports along a longer voyage, and when only considering the density of AIS messages, we have little indication as to the nature of the port visit. In contrast, the navigational status should reflect the vessel’s navigational intent and vessel’s should only use the “moored” signal when mooring. For instance, when vessels visit ports to bunker, or are waiting for a congested port, the navigational status “anchored” is used instead.
In order to consider both vessels’ spatial trajectories as well as voyage information, I constructed a method of abstracting trajectories into categorical and numerical values. This enabled me to apply more common classification machine learning techniques to the problem area. I achieved this by making an initial prediction purely based on the spatial trajectories using a trajectory similarity measurement method called the Symmetric Segment Path Distance (SSPD) algorithm. After constructing voyages using the aforementioned voyage definition, I made an initial prediction on every historical trajectory by comparing them to every other historical trajectory that departed the same port.
For every historical trajectory, this process estimated the Most Similar Trajectory’s Destination (MSTD) port and returned how similar this trajectory was to be used as a weight to the prediction. In the above image, the green line represents a historical voyage being compared to every other historical voyage departing the same port. The red line represents the most similar trajectory whose arrival port is the MSTD. In addition to the MSTD value, I also collected the similarity value, or weight, of the two trajectories and the current trajectory’s duration. These categorical and numerical values represented a trajectory in the final ML-based prediction method.
In order to ensure that the training set included voyages representative of real life scenarios, I also split every historical voyage into several incomplete parts to simulate an ongoing voyage from a vessel not yet reached her final destination.
After the initial pre-processing, voyage construction, trajectory simplification, and abstraction, the final process of constructing the final training dataset can be summarized as follows.
The training set had the following attributes:
After constructing the final dataset, and after encoding and balancing it, I trained an Extreme Gradient Boosting (XGBoost) classifier to predict the arrival_port value for the historical voyages.
The final trained model had a measured accuracy of 72% and was applicable toward analyzing the predictability of different vessel segments and sub-segments as well as determining correlating relationships between size, capacity, and predictability. I found that some vessel segments and sub-segments were more predictable than others. For instance, passenger vessels embarking on frequent and short voyages were highly predictable, while other vessels with more complex traveling patterns were less so.
In the evaluation stage of the project, I also interviewed several shipping experts to gain insight into the commercial validity of my thesis. It was apparent that applying this approach to a global range of vessels could be highly relevant in order to gain insight into their competitors and forecast the availability of vessels in different ports and regions. Furthermore, if the proposed solution were to be combined with a method of predicting the ETA of vessels’ prediction destination ports, you could get an overview over how many vessels of different vessels will be available in different ports at different times in the future. This aspect seemed interesting to the shipping experts who relied on similar analysis in their work.
There are many different aspects and applications of this work that would be interesting to further research as well as there are limitations and possible improvements to be done. As the proposed method is shaped as a modularized pipeline, it is designed to allow for iterations or improvements on the individual components. This should, hopefully, lower the bar for future improvements and further research.
The thesis set out to investigate the topic of AIS-based vessel destination predictions and maritime logistics as it can benefit the maritime industry. Although it has its limitations, it has, hopefully, provided insights into the challenge and complexity of this topic area and shaped a foundation that can be further extended upon in both an academic and commercial sense.
There is a lot more information regarding the motivation, process, results, and discussions available in the thesis. It will be published on NTNU open (https://ntnuopen.ntnu.no/ntnu-xmlui/) in the coming months. If you want to receive a copy of my master thesis, or if you have any questions or feedback, please send me an e-mail and give me a short presentation of yourself. I would love to hear your opinions and discuss the research area.