Nick Vennaro Nick Vennaro

AI Agent Based Implementation of “Cobots” in Healthcare Office Administration: A Cloud-Native Solution using Microsoft Technologies

Summary - full article on SSRN

The primary goal of this effort is to transform how physicians manage the administrative aspects of their medical practices including: appointment scheduling, workflow routing, reminders, referrals, contract analysis, insurance processing as well as a patient/physician portal that allows for secure interactions and conversation-based interface for patient-physician easy interaction. Rather than merely adding another tool for clinical care or administrative work, the mission is to offer a comprehensive service that minimizes the administrative burden and automates office functions to improve efficiency by integrating advanced AI and predictive machine learning technologies.

Full article including technical details on SSRN

Read More
Technical Nick Vennaro Technical Nick Vennaro

Transfer Leaning and Pre-Trained Models

I have been reading the book Deep Learning for Coders with fastai and PyTorch, which I do recommend for any deep learning implementers. There are many good ideas and specific implementations in the book.

I have been reading the book Deep Learning for Coders with fastai and PyTorch, which I do recommend for any deep learning implementers. There are many good ideas and specific implementations in the book. Early in chapter 1 the author’s piqued my interest when they mentioned pre-trained models. As a practitioner and someone who has to show results quickly pre-training and transfer learning have always been of interest to me. The author’s report that “The importance of pertained models is generally not recognized or discussed in most courses, books, or software library features, and is rarely considered in academic papers”. This seemed strange to me so I decided to dig a little deeper into the academic journals and look at various implementations to understand this topic more thoroughly because pre-trained models are an integral part of an insights as a service solution for various industries.

Definitions

A pre-trained model is a ML model that has been trained on a dataset other than the one you are currently using - the weights and biases have been updated and hyper-parameters already tuned on this other dataset before your current dataset is introduced. Transfer learning is a related but different concept, this is where the pre-trained model was trained on task(s) or domains that are different than the one you are using it for now. Goodfellow et al in their 2016 book Deep Learning defined transfer learning as “Situation where what has been learned in one setting is exploited to improve generalization in another setting”. Generally speaking, the pre-trained model was originally built and tuned on a very large corpus of data and will be used in a setting for which it was not originally trained.

There are numerous advantages to using pre-trained models:

  • Data Requirements - if you use vetted pre-trained models the data required to train the model on your current problem is much lower. Getting access to, cleansing and labeling large datasets represents a significant cost in time and money so savings here pays huge dividends in the future

  • Quality Assurance - using models that have been pre-trained and tuned can save you significant hours and you can be more confident in your end results

  • Time to Market - ultimately, pre-trained models provide a time to market advantage. Time to market and fast implementations are always a primary concern for my customer’s. I have to have answers to the question “how do I deliver this sooner?”.

Research Topic

I did a scan of the research and academic literature because engineering is the practical application of scientific knowledge, it would then be interesting to see what is in the implementation pipeline. I found less research in the area of pre-training models than I had expected. The bulk of the research I found and the references to other research work starts in 2015 to 2021 timeframe. Interestingly, there was a a significant amount of research in the field of transfer learning in the early 2000’s. The interest in transfer learning seems to have been spawned in 1995 by a NIPS (Neural Information Processing Systems) post conference workshop entitled "Learning to Learn: Knowledge Consolidation and Transfer in Inductive Systems".

The good news is there is a lot more research being conducted in the field of pre-trained models and transfer learning so I would expect additional frameworks and implementations to be available in the coming years. At the end of this post I put in a reference list the academic research I thought was most pertinent to pre-training and transfer learning.

Current Implementations of Pre-Training Models

Amazon Web Services: It maybe true that pre-trained models got short shrift in the academic literature but there is a fair number of them in the marketplace for the implementor to choose from. Considering AWS’ dominance in the industry I will start there. As of this writing I have not implemented AWS Marketplace pre-trained models on SageMaker. We had a client who had some interest in deploying on AWS so we did a few proof of concepts that worked well but in the end the decision was to remain in house. While there are many models to choose from we were not ready to commit to using ML models from some potentially unreliable vendors for a mission critical application.

PyTorch & fastai: I primarily use PyTorch so fastai is a good candidate because it is built on top of PyTorch. fastai adds higher level functionality — a layer of abstraction - above PyTorch that makes designing, developing, testing, and deploying your models easier. The folks at fastai are strong proponents of using pre-trained models and much of this framework is built on the assumption that you will leverage their models in your implementations. fastai was founded as a non-profit research group by Jeremy Howard and Rachel Thomas and provides an open source solution for practitioners. Here you will find models for image classification, natural language processing (NLP), text classification, CNN learner models, and others.

I have found fastai NLP models very useful for sentiment analysis. In 2018 Howard and Ruder published a paper proposing a Universal Language Model Fine-tuning (ULMFiT) a transfer learning method that can be applied to NLP problems to avoid training from scratch. Since that time they have furthered this effort to non-English language solutions using a multilingual text classification model called MultiFiT which extends ULMFiT. Very early research on the topic supports use of pre-trained models for sentiment analysis using CNN especially when the amount of labeled data you have access to is very small which is a problem we have encountered with clients. From Severyn and Moschitti’s 2015 research paper Twitter Sentiment Analysis with Deep Convolutional Neural Networks: “When dealing with small amounts of labelled data, starting from pre-trained word embeddings is a large step towards successfully training an accurate deep learning system.” A quick google search and you will find numerous implementations and examples that leverage fastai’s pre-trained models in a variety of domains one of which will surely fit your area of interest.

BERT: BERT (Bidirectional Encoder Representations from Transformers), is an open source technique for NLP that was created by Google researchers in 2018. To give you some idea of the power of using pre-trained models Google claims that in 30 minutes you can have your own state of the art question answer model (or other similar models) up and running on a single cloud CPU. BERT has been pre-trained on corpus of text from Wikipedia. BERT is primarily been used for voice and/or text searches or speech recognition. BERT has been fine tuned for various domains - document classification docBert, bio-medical BERT created in Korea bioBERT, and VideoBERT to be used for video captioning and action classification.

Vendor Implementations: In addition to the above options are vendor implementations and augmentations of models based on pre-trained solutions described above and some implementations based on their own in-house work. I have found vendor solutions helpful because they tend to be more domain specific which makes for faster model delivery for Fortune 500 companies that use Insights as a Service for implementations. The cost of transfer learning is then borne by the vendor and they can come to the table with better more refined solutions. You will also be able to take advantage of and tune for your individual needs ML models in common products. An example of this is an interesting implementation of Splunk and Tensorflow being used to detect fraud based on an individual’s mouse movements.

As you develop your company’s machine learning practice be mindful of the many advantages of pre-trained models.

References:

Devlin, Jacob, et. al. Google AI Blog: Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing

Esman, Gleb. 2017. Splunk and Tensorflow for Security: Catching the Fraudster with Behavior Biometrics.

Guo, Yuting. 2020. Benchmarking of Transformer-Based Pre-Trained Models on Social Media Text Classification Datasets

Howard, Jeremy and Gugger, Sylvain. 2020. Deep Leaning for Coders with fastai & PyTorch.

Han, Xu. et. al. 2021. OpenAI: Pre-Trained Models: Past, Present and Future

Kumar, Varun. 2021. Data Augmentation Using Pre-trained Transformer Models

Tian, Hao, et. al. 2020. SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis

Vennaro, Nick. 2017. Buy (don’t build) healthcare data insights to improve data investment ROI

Read More
Technical Nick Vennaro Technical Nick Vennaro

Data Collection, Preparation & Preprocessing in ML

Data preparation, quality reviews and formatting are precursors to successful machine learning (ML) efforts. It’s been my experience that clients consistently underestimate the effort necessary and time required to get datasets ready for use on a ML project.

Data preparation, quality reviews and formatting are precursors to successful machine learning (ML) efforts. It’s been my experience that clients consistently underestimate the effort necessary and time required to get datasets ready for use on a ML project. Data related issues range from security access, quality, quantity, low predictive power, validating the meaning of the data, etc. In this post I will talk about these issues as well as some others and offer ways to mitigate and solve them. During a project it is sometimes useful to think of feature engineering as a sub-phase within the data collection and prep phase but I have also seen seen feature engineering as a separate and distinct step in the ML project life cycle primarily due to the specialized skills and algorithms required. For the purposes of this blog post I am not going to cover feature engineering here.

I have an earlier post on the business aspects of ML here. In this post I will discuss the more technical aspects of machine learning. As an engineer I am more interested in the technical aspects of the project but it is always the non-technical issues that are the most time consuming to overcome and take a significant amount of planning and preparation. I think about the process of delivering ML results in four very broad categories -

  • Data Planning - data planning, raw data analysis, determining what data is needed, quality of the data, gaining access, data preparation and any associated preprocessing

  • Feature Analysis - numeric representation of the raw data, feature engineering

  • Model Development and Training - a mathematical or statistical analysis of the features and training of the model

  • Implementation and Maintenance - developing models and underlying processes that are executable on a regular basis, getting your models and the associated processes into a production ready state and maintaining them over time. Building scalable and repeatable processes.

I will talk about the first broad category in this post, that is, raw data analysis and data planning. The very first step in your ML journey is often times the most overlooked, under appreciated and under estimated (in terms of time required) than any of the other mentioned categories. Implementing a robust and scalable machine learning practice in a large enterprise will take careful data planning up front. Most teams want to dive into the more interesting work of feature analysis, model development and training but getting the right data, with proper level of quality, at the right time is key to success of the entire effort. This phase will alway take longer and be more difficult than you first estimate.

What Did My Machine Learn?

The reason it is important to focus on what data to use and ensuring you get data from broad and wide ranging sample sets is that the machine may not be learning what we humans think it is learning even if it gives you the correct answer. In her book Artificial Intelligence A Guide For Thinking Humans (which I highly recommend) Melanie Mitchell discusses how one of her graduate students trained a neural network to classify images into one of two categories - with animals and those without animals. The depth of field difference between the data - animal images tend to have a shallow depth of field with a fuzzy background where landscapes or pictures without animals tend have a much greater depth of field, with foreground to deep background in sharp focus. The neural net did very well with this task, except that the humans thought the neural net was learning to discriminate between animals and non-animals but further testing indicated what it in fact what the machine learned was to categorize images into two groups, one with blurry backgrounds and a second with in focus backgrounds. There are numerous examples of this kind of thing happening in machine learning. The important point to remember is understanding, reviewing, looking for diverse datasets, and being attended to overfitting is a key step in the use of machine learning.

What Data is Required

Generally, clients know what data they want to use to solve a given machine learning problem. I say generally because what is usually well known is the obvious choices - if we are working on a problem to increase sales we will want sales data, customer data and marketing data. Determining other less obvious choices that may help us find the answers to the questions we are interested in is the hard work. What about external data - will we benefit from this and if so which data sources are valid and reliable? For the most complete model and for the best outcomes, you should look for diverse data sources -- accessed from multiple sources (internal and external), across business domains, and at various points in time -- this will aid in developing robust and accurate ML models. Once deployed to production, the machine learning algorithms will need to continuously read large, diverse data sets to keep the model results fresh and accurate over time, so you will have to be mindful of getting a steady flow of data.

There is no algorithm that you can run to answer what data sources are required to give you the best outcomes. This information will come from the ML analyst’s experience and leveraging domain experts inside and outside your company. A few sessions with experts can generate ideas for new data sources that can be ranked and prioritized for deeper technical analysis and feasibility. At this stage it is a good idea not to limit thinking but to be open to possibilities even if there are issues. You can easily discard candidates that are not feasible at this time or put those data sources on your product roadmap for future model development. During one project’s data analysis session we uncovered non-obvious ideas for using new external data sources that were not immediately germane to the problem space, such as data from; weather systems, government housing and income data, and census information. Over the years we have developed a internal set of job aids that help us document, score, rate, and prioritize data sources for analysis. Getting a tool for this is a wise investment - it will help you organize your thoughts related to data sources and assist in tracking issues associated with those sources. You will see how this tool is leveraged further in the sections below.

Data Availability & Definition

As you develop your list of candidate data sources you must determine if the data exists, can you get access, is there a cost involved, and is this data available in the timeframe you need it. There may be security issues related to the data you are interested in, hopefully your organization has done a privacy impact assessment (PIA) to identify required privacy protections for the organization’s data. This should provide you with the security information you require. You must determine if you are allowed to use the data for the intended purpose. For example, some pseudonymization or anonymization would be required if there was PII (personally identifiable information) in your data sets. Data masking or removal methods will then need to be employed. I typically would prefer data masking because removal can often reduce the overall usefulness of your data. The two most common techniques used to mask data is:

  • Substitution cipher - each occurrence of restricted data is replaced by its hashed and encrypted value. You will need to use a salted secure hash algorithm to prevent repeatable values

  • Tokenization - a token which is a non-sensitive equivalent that is substituted for the restricted data. The token is a pointer which relates back to the sensitive data using the tokenization system. It is thought of as more secure than substitution cipher.

Depending on the technology/tool sets you are using these algorithms are fairly straight forward to implement. There are also readily available tools on the market for this purpose.

Licensing restrictions and access to third party data always has a long lead time. Whenever I work on projects that require access to third party data the time it takes is directly related to the size of the company seeking the license. The larger the company, the longer it takes. More lawyers, more restrictions, more issues about IP all takes time to sort out - make sure this is in your planning.

What do you know about the data, is the data and metadata defined? Understanding the formats (structured and unstructured), frequency of refresh, locations, owners, quality indexes, etc. Will this data be available long term, can you have/get access to the data you need for the timeframe needed for the project?

If you work in an organization that has well defined data ownership tools and processes you are in luck as this will save you hours of work. If you don’t have that luxury then a poor man’s approach may be needed but either way the data analysis plan and output that takes into consideration all the items discussed here needs to be maintained, this will prove invaluable as the project progresses. The data information (including metadata) that is documented during your machine learning project should be formally documented and saved in the appropriate repositories for later use. During this phase, I also like to document project related information — data definitions, metadata, why some data was included and excluded from analysis, data imputation details, etc.

Data Quality

When people think of poor quality data they are typically thinking of noise in the data. That is, the data is corrupted in some way - poor images (blurry), formatting issue - loss of spaces in text or decimal point issues with numbers, data can be missing, audio data may be incomprehensible not able to be transcribed. Depending on the extent of the missing data and the size of the data set it can be ok to ignore missing data and let the algorithm sort out the issue. However, if the data set is relatively small (in the thousands of samples) you run the risk of overfitting and the data set will need to be fixed or potentially lead to modeling the noise within the data. In a big data environment if the noise is random then the law of averages will take over and missing values will be averaged out. Or if the proportion of missing observations is small with respect to the overall size of the data set then the entire observation can be removed. I try to avoid deletions as it tends to be overused and biased estimates can easily slip into the process.

Data imputation techniques are used by the data analyst to infer the missing attributes. The most common imputation methods are listed below:

  • Mean or Median Value - Calculating the mean value for the existing features and using the calculated value as a replacement for the missing feature value is by far the fastest and easiest method to infer the missing attributes. The primary drawbacks to this method are; it can not be used on categorical data, there is no way to account for correlations between features and it is not always the most accurate method.

  • Mode Value - Imputing the data based on the mode is also a fast and easy method to infer missing values and it can be used on categorical data. Mode has the downside of potentially adding bias to the data set. A related imputation method is the creation of a new category such as “missing” which allows you to not lose track of the data you have changed and limits bias.

  • Random Sample - If your data is normalized and the data is missing at random a quick way to impute data is to randomly select values from the existing attributes to impute the missing values.

  • kNN - k-nearest neighbors is usually thought of as a classification algorithm but the process can also be used as a way to impute missing values. By using the samples in the training set that are nearest to it then averaging these nearby points to fill in the missing value. In other words, kNN imputation algorithm is a donor-based method where the imputed value is the average of measured values from k records.

    This is an enormously important area and a lot can be written but I want to spend some time discussing two of the hyper-parameters that get tuned when using kNN for imputation. Selecting the optimal value for k and an appropriate distance metric is crucial for the data scientist as they balance under and overfitting. In a subsequent text I will write more about the mathematics behind kNN classifier but to give you a feel for the importance of of the value of k see Figure 1 below.

Screen Shot 2021-07-22 at 9.16.02 PM.png

In this somewhat extreme example you can see how the value of k set to 3 vs a value of 7 becomes important in classification of an unknown value represented here with a “?”.

In a recent post I talked about churn modeling and how we tuned customer marketing treatments based on new external data sources, we used kNN in that analysis.


Implementation Advice: Your Python implementations of kNN for imputation is best done using the machine learning library sklearn. The class KNNImputer provides a way to complete missing data values based on the kNN algorithm allowing you to specify values for k and a distance metric (default value nan_euclidian). In another post I will discuss the different distance measures for kNN algorithms.

Data Sampling

Earlier in my career it was common to deal with the issue of too little data as opposed to too much data, now the situation is reversed. We are experiencing a data explosion - a rapid increase of information availability (wikipedia). By one estimate (Statista) the volume of data/information created, captured, copied, and consumed worldwide has increased from 2 zettabytes in 2010 to 41 in 2019 and expected to be at 181 zettabytes in 2025. Recall, a zettabyte is 1E+9 terabytes. This huge influx in data is driving the need for data sampling within ML projects. It is not necessary nor is it very efficient to deal with all the data in the data sets that may be of interest therefore you will need to become familiar with data sampling strategies. Formally, a sample is a statistical process that you employ to select a subset of objects from the larger population, this then defines your sample or observation set. Like many of the topics in AI and ML data sampling can become quite nuanced but for most practical purposes you will use one of the three types of sampling methods below. These the techniques are probabilistic sampling methods, where each data value has a chance of being selected and gives you a good representation of your population.

  • Simple Random Sample - a probability based method where every entry in your data set has an equal chance of being selected. While simple to implement using any random number generator it can be problematic if you don’t select a characteristic of interest or may entirely miss a minority characteristic that proves to be important

  • Interval Sample - with interval sampling the first value is selected at random this is your starting value. Beginning at your starting value, select every kth element until you get the desired sample size. Set the value of k such that gives you the proper sample size and traverses the population set

  • Stratified Sample - the statistical technique of sampling via stratification is done by creating groups or strata based on some characteristics of your data then you randomly select a sample from each strata. For example in one ML project we were interested in analyzing churn and how our population in different income groups responded to various treatments. We created strata based on income brackets and selected sample from each income bracket that was commensurate with the representation in the population as a whole. To perform stratification sampling you of course need a good understanding of the underlying data.

The the sampling techniques above are easily implemented with Sklearn, Python and Pandas using classes provided.

Encoding Categorical Data

Input data to machine learning algorithms will most likely include categorical data as well as numerical. While it is true that some ML algorithms can use categorical data without encoding, decision trees for example, many other algorithms can not operate on labeled data directly, therefore these features will require some type of encoding. Examples of categorical variables are: “color” with values: red, blue, green; or “fabric” with values: cotton, silk, wool, polyester; or “size” S, M, L, XL. The variables color and fabric are examples of a nominal categorical feature where no order is implied whereas size is an ordinal categorical feature where an order is implied as: XL > L > M > S. Where there is no implied order between fabrics or colors.

Ordinal encoding can be used on ordinal values transforming the text to numerical values to accommodate ML algorithms. The data for size could be encoded as 0=S, 1=M, 2=L, 3=XL. A reverse mapping dictionary and process can be easily implemented to reverse this process to retrieve the more meaningful values for reporting. It is not recommended to use ordinal encoding on nominal categorical data as the ML algorithms will treat features such as fabric and color as ordinal and we have seen this would not make sense. For nominal features, one-hot encoding is a commonly used.

The idea with one-hot-encoding is to convert each unique value in the nominal set to a dummy feature (that is why this method of encoding is sometimes called dummy encoding) that will take on a binary value. For the fabric example above four new features representing each fabric would be added to the data set - cotton, silk, wool, polyester. A binary value would be assigned based on the sample’s fabric type. If the row value was for a cotton fabric the binary values would be cotton= 1, silk=0, wool=0, polyester=0. You can see why it is called one-hot-encoding one value is on or hot while all the others are off. One-hot-encoding can perform very well but it’s easy to see that you will quickly have feature expansion depending on the size of k, where k is the number of unique nominal features. As with all my posts the easiest implementations are done with scikit-learn, pandas and python and one-hot-encoding is no exception. OneHotEncoder is a scikit-learn implementation in sklearn.preprocessing module to encode categorical features as a one-hot array. Also you should be aware, there is a handy class in sklearn.compose called ColumnTransformer which allows you to selectively transform individual columns in an array or a pandas DataFrame.

Implementation advice: for illustrative purposes the encoding examples above are fairly trivial but in real world implementations these can get lengthy and extensive. In most industrial strength development efforts I am on we build specific dictionary mapping sub-modules for encoding and decoding data. We develop them to be robust and extensible as they are used frequently.

Partitioning Data

Up to this point we have been writing about datasets in their entirety. In fact, you will need a process to partition the data into three subsets - training, validation and hold out (sometimes called a test partition). As a rough estimate you can think of the percentage of data split across the three partitions as 60,20, 2o respectively. Or if you are only need a training and test dataset the split could be 70, 30.

  • Training dataset - this data partition is used to train your model(s). With supervised learning the training data is labeled and is used by your ML algorithms to “learn” relationships between features and the dependent variable.

  • Validation dataset - this data partition is used to tune the model’s hyperparameters and determine/maximize model performance. If you are testing multiple models this is the data you will use to determine which model fits the validation data the best, that model will move on to the next phase, which is testing against the hold out data.

  • Hold out or test partition - this partition is used to determine final model performance on data that it has never seen before.

In summary you fit/train your model on training data, tune and make model predictions using validation data with final testing being done using the model that fits the validation data the best.


The code snippet below will give you an idea of how to use sklearn LabelEncoder and train_test_split on your data. In this example I used the Wisconsin Breast Cancer dataset so you could see an example with real data using pandas and sklearn classes. I cut the dataset down to a few rows and columns for readability and put in print statements so you would be able to see what is happening at each stage. In the data you will see that ‘M’ and ‘B’ representing malignant and benign get transformed to 1 and 0 respectively. I also call train_test_split to demonstrate a 70:30 split of the data.

Screen Shot 2021-08-12 at 1.22.19 PM.png
Screen Shot 2021-08-12 at 1.26.59 PM.png

Typically, you will do your data partitioning to the raw data that has been randomized before any partitioning.

The data collection and prep processes will need to be run repeatedly so it is best to build scalable and robust modules for this purpose. When you are just starting your ML journey it is tempting to build quick and dirty scripts or to try and take short cuts related to data preparation but time spent here will be time saved elsewhere. Good data and a solid methodology for processing and preparing it is foundational to your successful ML practice.

Read More
Strategy Nick Vennaro Strategy Nick Vennaro

Insights as a Service - Using External Partners for AI/ML

Whether you are just starting to think about implementing an AI practice using machine learning (ML) or want to expand your existing ML operations, insights as a service is an alternative that corporate CxO’s should consider.

Whether you are just starting to think about implementing an AI practice using machine learning (ML) or want to expand your existing ML operations, insights as a service is an alternative that corporate CxO’s should consider. Insights as a service is a methodology that allows business units to pay by the drink for data insights related to specific high value business problems.

Definition of Insights as a Service

Insights as a service is a methodology to leverage an external partner for some of your ML requirements. The partner will provide tools, ML models, compute resources, trained data scientists and domain experts, and most importantly access to a trove of external data for you to leverage during your analysis. The buyer pays by the drink for this service, meaning you pay for the insights you receive. We have designed deals between parties where the buyer only pays for actionable results, if there is a dry hole i.e. no actionable information then you don’t pay for that effort. There are many other ways to set up contracts to incentivize the partner to produce what you ultimately want - results that move the needle for the business. Further on in this post I have a case study showing some of the details of a successful implementation.

Why use Insights as a Service

It’s not an entirely hands off process when you invest in insights as a service I will cover the buyer’s responsibilities in the implementation sections below. You should consider using Insights as a Service for one reason - speed to market, getting actionable insights sooner. If you choose your vendor/partner wisely you will be able to get insights sooner because that partner is bringing their own data including licenses for external data, pre-build and tested ML models to be refined for you, and they should have trained staff ready to deploy to your project. For companies just getting started with AI I have seen implementations that typically take 8 to 12 months from start to finish get cut to 2 or 3 months. Even companies with a mature AI/ML practice an insights as a service provider can augment your internal team providing additional expertise, new data sources and staff that can cut your project backlog providing the business areas with valuable insights for decision making.

Implementing Insights as a Service

Insights as a Service programs that I have put together generally fall into one of two groups; the first is a mature organization that wants to add additional throughput into their organization to meet increasing demand or to provide some expertise that they lack. The more challenging case is the second situation where the organization wants to get started with ML and is looking for a way to show real progress in a reasonable amount of time. It is this second scenario that I am going to focus on here.

Getting Started in AI

The process of starting an AI program is the same as introducing any other significant and new technology into the enterprise with the difference being that AI will bring greater opportunities and implications for your business. AI is considered by many to be a general-purpose technology which are technologies that have profound and lasting changes that ripple across the globe spawning new industries, ways of working and changing how people live. For example, electricity was such a technology - it significantly changed how people lived and worked. New products were created and it spurred explosive growth and disruption. Introducing AI and machine learning into the organization takes a significant amount of up front work.

For this post I will approach AI/ML introduction from the perspective of the Chief Technology Office (CTO) as they are a likely advocate for the technology. As a senior leader, the CTO must take on the role of educator of her peers at the senior leadership level. Helping them understand in business terms what AI/ML is and is not, the kinds of problems that it can be best applied to, what kind of returns and new opportunities could open up because of the new technology. At the heart of doing the hard work and sometimes long process of this education is good old fashion marketing. In the past I have assisted CTO’s with this effort by leading tours and holding Q and A’s with other companies who have successfully and recently started this journey, brought vendors in for short “art of the possible” meetings, engaged speakers (technical and business) to discuss industry trends, email short pieces that explain AI/ML. I would recommend reading Andrew Ng AI Playbook which I have followed in the past for successful POC/MVP projects.

Thinking “stractically”; not too strategic not overly tactical

There is a trade off between strategy and tactical — in the beginning phase I recommend a happy medium which we call “stractical”. You want to think strategically but don’t get too ivory tower, you want to be tactical but not short sighted. The CTO needs to help senior leadership balance these two competing interests. We help customers think about digital transformations and if there ever was a technology that will drive transformation it’s AI/ML. Be sure to think about the strategic direction you want to pursue, for example; how will this new and powerful technology allow me to service my current customers better and open new opportunities, what are my competitors doing in the space, who are my competitors now and does AI change the competitive landscape, how will I avail myself of new data and new sources of data, what new business innovations does this open up for us, and how do I adapt to my customers changing needs and values, how will I build an adaptive value proposition. As you, the CTO think through these areas you will simultaneously be thinking of which project(s) you should choose as a pilot aka proof of concepts or as I prefer to think of it, as a minimal viable product (MVP).

Conducting the training and education sessions while guiding the conversation between a strategic and tactical plan is necessary for success in your organization. You will uncover opportunities and build trust and gain supporters, which you will need to be successful.

Choosing the Right Project

Choosing the right project for your first implementation is important as you want to maximize your chances for success as well as your impact on the business. Some project characteristics to think about:

  • Transparency of the model: If you need to understand why your models made a certain prediction or why they didn’t make a different prediction you most likely won’t be able to get this information using current ML algorithms - for the most part ML algorithms in use today are black boxes even to the inventors.

  • Data availability: Is the data you will need available, accurate, usable, and sizable? Do you need and can you get rights to external data sources?

  • Industry related project: Don’t pick a project that is unrelated to the business you are in. If you work for a medical device company don’t pick a project that uses ML to make staffing/hiring decisions more quickly. Look for a project that moves the needle for the business domain you are in so when you are “selling” the ideas and techniques demonstrated from a successful implementation it’s readily apparent how it applies to the business at hand.

  • Too large vs trivial: As you think about your first foray into machine learning you will need to balance the trade off between a project that is too large vs one that is just trivial. Look for something that will produce a relatively quick win and demonstrate to senior leadership that this is a direction that is worth investing in for future growth. Ideally you will look for a project that will positively impact revenue growth.

Choosing the Right Partner

This post is focused on Insights as a Service, the crux of which is partnering with a company that can augment your organization’s skills and tools so you can quickly maximize your chances of success in implementing a new technology. With machine learning you need to look for a company that fits culturally, brings strengths where you have deficits and vice versa, has relevant business domain experience, brings pre-trained models, has labeled data from external sources along with the rights to use that data, a deep bench of data scientists, software engineers, and statisticians, and brings the compute power and tools necessary to run a machine learning project.

A good partner will be able to assist you with all phases in the lifecycle of a ML project. In a coming post I will discuss the ML project lifecycle in detail but for know understand that this is a multi-step process that begins with business goals and direction, data collection and analysis, feature engineering, model development, training and maintenance, through model deployment. When you are choosing a partner it is incumbent on you to evaluate them across this lifecycle to ensure that they have the wherewithal to help you in this journey.

Be aware of your financial goals as you choose a ML partner and how you wish to incentivize them for your success. Seek out a partner who is open to new more modern financial arrangements such as gain share or risk/reward share models for success.

Reserved Data: Test Set

When using an Insights as a Service partner the most important item to remember is to hold data back from your partner to use as a final evaluation after they have completed their model building, training and validation testing.

Data for ML is split into at least two sets - a training set and a validation set. The training set is used for training your models. The validation set is used to evaluate the the model’s performance by the ML developers. In theory the model is being validated on data it has never seen before so if it is making accurate predictions on the data it is because it has learned the characteristics of the data not because it has seen the data before. It is very similar to studying for a test in college - if you study by reviewing practice problems it is generally not a good idea to give the students the same questions on the test that they used in practice — it will obviously bias the results.

A test set is a third data set. This is a highly reserved data set - it has never been used by the models for updating weights or biases nor has it been used for validation, as a matter of fact, it has been kept hidden from the model builders themselves. It is used solely to evaluate the model at the end of your efforts. You will want to pick a metric that is useful to your business need and requirement set to ensure that the model will meet all your expectations. This step, of holding back a test data set is absolutely necessary for a successful Insights as a Service program.

Case Study

We were contracted by an entertainment content provider to assist them modernize their business intelligence area, this included creating a AI/ML group. The wanted to move quickly, get quick wins and influence the business to show the promise of ML. We proposed an Insights as a Service model where they would pay for useful insights and not be charged for unsuccessful attempts. We implemented a vetting process to evaluate candidate ideas for projects, reviewed those candidates against financial, technical and business criteria to then rank them to be added to the backlog accordingly. This process then fed the ML project lifecycle (discussed above) for eventual deployment. Simultaneously we were evaluating potential partners who fit the criteria we defined - criteria based on the characteristics outlined earlier. The chosen provider was eager to prove that they had the experience and ability to work with the client and was amenable to a financial approach based on success criteria that was a win/win for both parties.

Choosing the first project was a difficult decision. We wanted it to be impactful, in the companies primary business area and not too big or unwieldy. We settled on the somewhat risky decision to focus our effort on churn analysis and how to augment the current churn models with new data sources and model updates that would recommend changes to customer offers based on this new information. Churn and how to minimize it with hyper-targeted customer offers was a key measure for success in the industry so any changes to the churn model were scrutinized very carefully. In the end, our MVP/POC was successful, churn was significantly lowered and we did it with a lower overall spend on customer offers.

The above project proved successful, we implemented subsequent project for further evaluation and we eventually built up the program to run at enterprise scale.

Whether augmenting an existing practice or just starting an MVP insights as a service should be considered as part of your business intelligence or machine learning strategy.

Read More
Technical Nick Vennaro Technical Nick Vennaro

Data Design and Microservices

As companies move to a microservices architecture a few areas of data design need to be considered, they are: data sovereignty and database complexity including use of foreign keys and joins.

As companies move to a microservices architecture a few areas of data design need to be considered, they are: data sovereignty and database complexity including use of foreign keys and joins. Data sovereignty and its implications on your microservices architecture is sometimes a contentious topic that comes up during the design phase. One of the rules or at least strong guidelines for microservices design is that each microservice owns its domain data and its logic. The services should be loosely coupled - a change in one service does not require a change to another service — and highly cohesive — services that perform a similar function reside together and services that meet a different functional requirement reside elsewhere. Related to loose coupling is data sovereignty, which is a difficult concept to grasp when you are used to dealing with large monolithic systems that have one large data store with foreign key constraints.

Data Sovereignty

Data sovereignty is a design choice where each microservice or a set of services that are highly cohesive own their own database - these is not one large dB as there typically is with a monolithic system and the loss of a “single source of truth” has to be addressed. This is a more difficult design to implement because you have data split across databases, communications costs have to be managed, sometimes a single ACID transaction can not be made across databases (eventual consistency needs to be considered).

Screen Shot 2021-02-15 at 8.57.49 AM.png

Data design for microservices - note the separate data stores for each app area based on functionality.

Queues can be used for async transactions across multiple services.

Foreign Keys & SQL Joins

Foreign keys are another consideration when building out microservices. FK are usually a given in monolithic systems but this is not necessarily so with microservices. I want to be clear here, you definitely can use FK with microservices but its not always a default decision. SQL joins are another area of complexity when using a RDBMS, they are a powerful tool but can be difficult to debug and scale. Microservice teams have often opted to limit SQL joins in favor of a caching mechanism to offload work from the dB.

If you are reading this you already know what foreign keys and a SQL joins are and why they are used so the question is why would you not always employ them with microservices? When teams are very small and operating without a dedicated database staff, simplicity of logical/physical database design is often a priority so constraints are moved out of the dB layer. As discussed above, designing for data sovereignty means encapsulated data and microservices tends to operate with small more nimble teams often times without a full-time traditional DBA role. Scalability is another reason to move away from a more complex dB designs. The dB layer can be the most difficult area to scale so making this less complex and moving the workload to a caching tool is another design option. To improve dB simplicity, ease maintenance, improve performance and scalability, and to lighten the dB load you can do key-value lookups using an in memory data store. An early choice for this was Memcached and remains a viable option for ease of use but Redis seems the better option these days. Redis is feature rich and it’s part of the portfolio in Amazon Web Services and Microsoft Azure making it an easy choice for most cloud deployments. I have seen benchmarks where caching can improve performance time of key-value lookups from from O(n) to O(1).

A microservices architecture does not prevent you from using the dB as you would with a monolithic system; if you do just make sure you have a strong dB staff on the team however if you are making the move to to microservices you should consider all options.

Read More
Technical Nick Vennaro Technical Nick Vennaro

Microservices Tooling

Previously in this three-part series, I wrote about how to introduce microservices in a legacy environment and provided an overview of domain driven design (DDD) and how this development philosophy can be used to represent the real world in code, while also being well-suited to a microservices implementation. This time, I will cover some of the tools and frameworks that can be used when implementing microservices.

Previously in this three-part series, I wrote about how to introduce microservices in a legacy environment and provided an overview of domain driven design (DDD) and how this development philosophy can be used to represent the real world in code, while also being well-suited to a microservices implementation. This time, I will cover some of the tools and frameworks that can be used when implementing microservices. 

I deliberately saved the tools discussion for last because I often find clients like to jump into tools before they have completely thought through why, and if, a new architectural approach can or should be implemented in their environment. After you have decided that you should move to microservices then it’s appropriate to think about how it can be done and with what tools. 

Containerization

Microservices architectures do not require containerization, but it is one set of tools that will make your life easier on a number of fronts. 

The metaphor of a shipping container is used because through standardization containers allow goods to be transported by different shippers via ship, rail, or truck regardless of items in the container, knowing that the contents of one container will not affect the others. These containers can be stacked and moved easily.  

In software, containerization facilitates the packaging and deployment of services across environments with no modifications. Additional benefits of containers include:

  • Application isolation

  • Application scalability; containers allow you to easily scale your application services up/down as needed

  • Application portability

  • Allows for, but does not require, microservices implementation

  • Continuous integration/continuous delivery (CI/CD)

As you can imagine, there are tools in the market place that allow you to containerize your applications and manage/orchestrate those containers in your environment and in the cloud.  The three leading products are Docker, Kubernetes, and Mesos, with Docker having become the industry de-facto standard. There are some use cases where one of the other tools may make more sense, such as Mesos for extremely high scalability (e.g. Twitter and Apple).

Developing microservices is only the beginning; the effort and thought necessary for testing, deploying, running, and maintaining services should not be underestimated. To a large extent, the types of tools and techniques you will select will be driven by your existing technical infrastructure. If, for example, you are a heavy user of IBM products then you will be looking to them for solutions. 

I have come to believe that the cost of adopting and developing with new tools and frameworks, or in a new language, outweighs the benefits so you should be very careful if you plan to move from your primary vendor’s ecosystem to new one.  There is, however, a rich set of tools for microservices deployment and maintenance to choose from that can be used to complement or create your DevOps environment. Here’s an overview of some of these tools and frameworks below.

For CI/CD build management and automation the most widely used tool is Jenkins. Jenkins has a rich library of plug-ins that extends Jenkins to almost any external tool in support of the CI/CD process. Jenkins runs on multiple operating systems; Windows, OS X, or Linux. Configuration of Jenkins is not difficult and has strong support in the industry, so hiring staff -- which should always be a consideration -- is manageable.  

There are, of course, other tools. I have worked with companies that are 100 percent Microsoft and wish to stay in that environment whenever possible. Microsoft offers a product called Team Foundation Server (TFS), which also has garnered strong support in the industry. TFS is tailored for Visual Studio IDE but will work with Eclipse and other integrated development environments. In addition to support for the usual CI/CD functionality, it offers integration with Microsoft’s cloud platform and tools such as Azure Service Fabric and Azure API Management (for gateway and portal services). Combining Docker and CI/CD tools can have a profound change on your microservices implementations and deployments. 

Testing of microservices poses some unique challenges due to the distributed and the inter-connected, but loosely coupled nature, of their design. The basic principles of testing should be familiar – in 2009, Mike Cohn descried a testing pyramid that has, with some modifications, been widely adopted. This approach is applicable for microservices testing and the need for automation may even be greater now.  

Here again, vendor frameworks can help. Microsoft Team Foundation Server offers a full suite of testing tools and services that work on Azure. Jenkins, too, comes with microservices test tools. In addition, some specialized testing tools are available, such as Rest-Assured for Java testing of REST services and WebInject for testing.

Lastly, I want to touch on using microservices for front-end UI development.  It is typical for companies to start their journey with server-side microservices.  As you move up the maturity curve it will become apparent that a monolithic UI becomes a bottleneck in your development/delivery process as the front-end becomes more unweidly.  To solve this I have used a composite UI with microservices.  You can develop composite UI with traditional tools such as ASP.net and you can augment your development process with a with services and libraries like Project Mosaic (https://www.mosaic9.org). 

I hope this blog series has helped you gain a better understanding of how to begin your journey to microservices in a legacy environment, how domain-driven design can be used to jumpstart your design efforts, and the tools that are available to facilitate your work.

Read More
Ideas Nick Vennaro Ideas Nick Vennaro

Plato OO Design and Why You Should Read Widely

I was recently reminded of the importance of reading non-computer science information like novels, history, art, music, etc. as a source of ideas or inspiration.

Like many in the field I read a lot of computer science information – there is always topics to keep up on.  I was recently reminded of the importance of reading non-computer science information like novels, history, art, music, etc. as a source of ideas or inspiration. 

I was reading about Plato’s theory of Forms in a text on the history of western philosophy. I immediately thought of object-oriented design and programming as the similarities between the two are uncanny.  A quick Google search reveals that others had noticed this before I did.  While it is interesting to read Plato and observe that it has parallels to something (in this case OOD/P) that already exists it is of course another matter entirely to be able to read Plato, experience the light bulb moment and extend Plato’s work to new areas – that is why we should all try to read beyond your chosen field.

The topic of Forms and OOD/P is covered in many other places – see SpringerLink for an article that discusses the various similarities between philosophical theories and computer science.

Plato and Object Oriented Design

Briefly, Plato was studying and thinking about metaphysical concepts when he developed the theory of Forms, where a Form is a template or a pattern for some real-world object or concepts. A Form is not the real-world object, but a representation and it contains certain attributes and behaviors. In Plato’s theory humans have links to these Forms and demonstrate the Form’s characteristics – in OOP this would be similar to instantiating an object from a class. The object (an individual human) then has the attributes and functions of the class (Form). An example from A History of Western Philosophy by Steven Evans would be the concept of courage – Sampson or a lion is said to be courageous but neither one is courage. Courage is a Form it has attributes and behaviors and Plato saw these as an unchanging reality that is different from the objects that display courage, that are themselves constantly changing.

As stated in the beginning of this short post – it is always interesting seeing connection between computer science and other disciplines but the hard part, and the real magic is to make the connection, extend the work and create something new.

Read More
Technical Nick Vennaro Technical Nick Vennaro

From Domain Driven Design to Microservices

As many of you may recall, the software design and architecture style known as service-oriented architecture (SOA) emerged in the mid 1990’s. Since then, we have discovered better ways to build systems, including advances in cloud-based virtualization, continuous integration and delivery, and microservices. In the process, these technologies have made SOA and all the associated benefits a reality.

As many of you may recall, the software design and architecture style known as service-oriented architecture (SOA) emerged in the mid 1990’s. Since then, we have discovered better ways to build systems, including advances in cloud-based virtualization, continuous integration and delivery, and microservices. In the process, these technologies have made SOA and all the associated benefits a reality. 

In March, I published the first post of this three-part blog series – How to Introduce Microservices in a Legacy Environment – explaining how microservices can be introduced into a large organization with well-established legacy systems. In this post, I will cover domain-driven design (DDD) and how this development philosophy can be used to represent the real world in code while being well-suited to a microservices implementation.

Domain-driven design 

 Cohesion is an early tenet of software design and refers to the degree of functional relatedness that exists inside a module or class. In this context, cohesion was first described in the late 1970’s by Tom DeMarco and has come to mean grouping and keeping together those things that change for the same reasons and separate the functionality that changes for different reasons.

DDD provides a method to facilitate the development of highly cohesive systems through bounded contexts. Microservices is an implementation approach that encourages you to focus your service boundaries on the business domain boundaries. DDD and microservices can be used together as you move your organization to a service-oriented design and reap the benefits of continuous integration and delivery.

 The seminal work in DDD was defined in a 2003 book by Eric Evans called Domain-Driven Design: Tackling Complexity in the Heart of Software. The overarching philosophy of DDD is to use the notion of bounded contexts which form protective layers around models that define the business domain. Bounded contexts are analogous to departments in a company – the legal department has certain specific responsibilities (contexts) that are different than the IT department and those responsibilities are enforced by rules (boundaries) for interaction and obtaining services from the departments.  

 This is the same for bounded contexts that we model using DDD. To facilitate a common understanding of the problem domain and translate that domain knowledge into a computer system the business and technical team must develop a common language. In DDD this common language is called the ubiquitous language (UL). As the technical staff develops their models and code they use the UL to decrease the risk of misunderstanding between the business analysts and the engineering staff as the project progresses.

This also serves to provide an additional layer of documentation of the systems and enhances the organization’s understanding of how a system was designed and intended to work. The analysis models that are used to understand and define the domain are tied to the code models that are used to create software by the UL.

Other key principles of DDD include:

  • Iterative creation of the analysis and code models. As the team learns more about the domain they iterate on their analysis and code models, keeping both in sync. DDD does not specify tools, databases or languages, but I have used UML (universal modeling language) to create analysis models and my code or implementation models were done in C++ and Java.

  • Collaboration of the business and technical teams requires close, face-to-face collaboration to create the relevant models. This is a heavy commitment for all parties to develop the UL, use it to define the domain, iterate through the definition of the domain, and focus on the problem instead of jumping directly to a solution.

  • Focus on the core domains – the core domains are those domains that will make the product a success. It is a core domain if it is absolutely essential to the success of the business. You should be asking yourselves how this domain increases revenue, cut costs, or increase efficiency, and why and how this domain is critical to the business.

 The problem you are solving must be substantial. There is no use in implementing DDD for problems that are insignificant, won’t move the needle for the business or are better solved with a COTS (commercial off-the-shelf) solution.

After you have begun to understand the business problem and developed models to define it you will have to think about how to integrate bounded contexts. In their book Enterprise Integration Patterns (Addison Wesley Signature Series) Gregor Hohpe and Bobby Woolf define four integration styles: file transfer, shared database, remote procedure invocation, and messaging. In most applications of substantial size and for reasons of cohesion your DDD and technical team will most likely settle on remote procedure invocation and/or messaging for integration.

From domain-driven design to microservices, pairing these two approaches to solve large and complex problems makes good business sense.

The third and final post in this series is an overview of microservices tooling and can be found here.

Read More
Strategy Nick Vennaro Strategy Nick Vennaro

Entrepreneurs and Innovation

“Today, it is truer than ever that basic research is the pacemaker of technological progress”.

“Today, it is truer than ever that basic research is the pacemaker of technological progress”. This sentence was written in 1945 by Vannevar Bush in A Report to the President entitled Science the Endless Frontier. Bush lays out an argument for why and how the US Government can foster research activities by public and private institutions to maintain and enhance its competitive position. Bush did not want to conduct research just for the pursuit of knowledge he knew that it would lead to new products, practical usages and bolster our competitive position on the world stage — “New products and new processes do not appear full grown. They are founded on new principles and new conceptions, which in turn are painstakingly developed by researching the purest reals of science”. I read this report while I was in grad school and thinking about working in the public or private sector and I had this in mind while I was reading Mariana Mazzucato’s book The Entrepreneurial State.

Mazzucato reminds us of the importance of Vannevar Bush’s idea of government investment in R&D but she pushes beyond the notion of the government as a behind the scenes deep pocketed investor funding basic research but she advocates for governments to be recognized for their risk taking entrepreneurial market making activities they engage in. She provides some very interesting and compelling examples from domains such computing, Pharma, bio-tech, and clean energy to demonstrate that “innovations” from the private sector were really built on top of the hard work and investments done by and/or funded by the State, Mazzucato takes nothing away from the success of Apple but spends a chapter discussing the iPhone and how it was built based on technologies pioneered by the US government - see diagram below from The Entrepreneurial State.

What Makes iPhone So Smark.png

Taking nothing away from Apple’s technical, design and business success Mazzucato demonstrate the State’s role in the critical underpinnings that make the iPhone possible.

The US government in particular seems to suffer from an inferiority complex and thus has a marketing problem. The State has ceded the narrative to others and allowed the tax payers to forget how some major technologies were developed. Mazzucato goes on to discuss fiscal/tax policies that could allow the government to become more of an active investor in technologies it develops to fund the next ventures and recoup some expenses and who knows, maybe have some money left over. She does have her critics for sure, but she also has supports on both ends of the political spectrum. It is definitely something to think about.

I have started a number of technology businesses over the years and must admit I continue to love the idea of the lone genius inventor out to change the world … maybe we have been taking too much credit?

Read More
Technical Nick Vennaro Technical Nick Vennaro

Introducing Microservices into a Legacy Environment.

This is the first in a three part blog post series that discusses how to introduce microservices into a legacy environment.

While currently no consensus exists on how to define microservices it’s generally agreed that they are an architectural pattern that is composed of loosely coupled, autonomous, fine-grained services that are independently deployable and communicate using a lightweight mechanism such as HTTP/REST.  Now is the time for companies -- particularly enterprises that need to make frequent changes to their systems and where time to market is paramount -- to be investigating how best to introduce microservices in their legacy environments if they expect to realize a digital transformation that drives tangible business results.

The benefits and potential hurdles associated with adopting microservices are well documented. On the plus side, the modular and independent nature of microservices enables improvements in efficiency, scalability, speed and flexibility. Detractors, however, frequently point to management and security challenges, especially when they pertain to customer-facing applications and services. 

Like virtually all technology decisions, it’s critical to balance risk with reward and, when it comes to microservices, embracing an evolutionary approach and process. After all, lessons can be learned from both success and failure, and the same is true for implementing microservices that can increase product and service quality, ensure systems are more resilient and secure, and drive revenue growth. In the first of this three-part series, I’m going to explain how business and technology leaders can smoothly and successfully introduce microservices in a legacy environment.

It’s all about the monkey

A key requirement of microservices design is to focus service boundaries around application business boundaries. A keen awareness and understanding of service and business boundaries helps right-size services and keeps technology professionals focused on doing one thing and doing it very well. 

In my experience, I’m finding that the larger the organization the greater value microservices architecture can deliver, but only if executed in a systematic, enterprise-wide fashion. Fortune 500 organizations tend to have a significant proliferation of legacy technologies and should strive to simplify deployment, along with applying continuous integration and delivery of microservices. All too often, enterprises focus their efforts on buying tools, implementing a small proof-of-concept or other “quick wins” that likely aren’t the most effective place to initiate microservices strategies.

Astro Teller, the “Captain of Google Moonshots” has a humorous anecdote about where to begin when solving a large and complex problem and advocates that companies should avoid allocating all of their resources on the easy stuff and instead start by addressing the hard problems; he calls it “tackling the monkey first.”  The monkey, when deploying microservices in a large, established environment, is understanding and decomposing the legacy systems.

Decompose the legacy environment by identifying seams

In the second part of this series I’ll cover domain-driven design (DDD), but for now it’s important to understand two concepts found in DDD: bounded contexts and domain models. 

Any problem domain is composed of a number of bounded contexts with models sitting inside them. The bounded contexts provide a level of protection and isolation of the models. In addition, the bounded context provides an interface to the model and controls what information is shared with other bounded contexts. For example, in an e-commerce application some of the bounded contexts may be ordering, pricing, or promotions.   

Years ago I enjoyed reading “Working Effectively with Legacy Code” by Michael Feathers. In his book, the author presented the idea of a seam as a way to identify portions of code that can be modified without affecting the rest of the code base. This notion of seams can be extended as a method to divide a monolithic system into bounded contexts from which services can be quickly and seamlessly created.  

Uncovering seams in applications and building bounded contexts is an important first step in breaking down the monolith. Seam identification can be accomplished by reviewing the current code base, interviewing domain experts, and understanding the organizational structure. A few suggestions:

  • Review the current code base. When reviewing the current code base and any artifacts it’s critical to realize that this is only a starting point. The code is often redundant and difficult to understand. 

  • Interview domain experts. This is a key step to learning where the seams are and identifying bounded contexts. Having domain experts that understand what the business should be doing not just what the system currently does is critically important.

  • Understand the organizational structure – Often, organizational structure will provide clues to where the seams can be found.

 

Once bounded contexts are identified, along with the programming language and environment that support them, creating packages and sub-packages that contain these bounded contexts should closely follow. This approach will afford a careful analysis of package usage and dependencies, which are paramount to fully and quickly understanding and ensuring that testing and instrumenting code is being done properly.  In addition, there are some standard design patterns that should be followed:

  •  Open Host Pattern - Exposing legacy systems via JSON/HTTP service.  Here the isolated legacy system is exposed through an API that returns JSON.

  • Anti-Corruption Layer (ACL) Pattern – A translation layer or sometimes called bridge layer is built between the legacy environment and the microservices code.  This pattern can be effective for short durations but it can be costly to maintain over time. We have also called this layer “scaffolding” it’s needed during the transition but will be taken down after the job is done.

 That’s how microservices should be introduced in a legacy environment.

Read the second post in this series here

Read More