What is AWS? Why Every Data Science Professional Should Learn Amazon Web Services

MUKUL850 28 Sep, 2020
6 min read

Overview

  • Amazon Web Services (AWS) is the leading cloud platform for deploying machine learning solutions
  • Every data science professional should learn how AWS works

 

Introduction

“Your machine ran out of memory.”

Sounds familiar? It certainly is for me – especially anytime I try to run a complex machine learning algorithm on my personal machine. It’s quite a frustrating experience that a lot of data science professionals feel. We don’t have the unlimited computing power of the tech behemoths – so what should we do?

This is where the power of the cloud has transformed data science. And Amazon, with its AWS offering, has conquered the data science market like nothing before.

Amazon Web Services

Cloud computing has seen tremendous growth in the past few years. Almost every organization nowadays uses cloud computing for its wide range of services. 70% of all the money spent on tech is expected to go into cloud services by the end of 2020.

Did you know that AWS’s revenue in the first quarter of 2020 was $10 billion? That’s almost twice as much as its next closest competitor! Every data science professional, from a data science to a data analyst, needs to learn AWS and how it works.

So in this article, let’s dive into what AWS is and find out why it has come at the forefront of cloud computing services.

 

Table of Contents

  1. What is Amazon Web Services (AWS)?
  2. History of Amazon Web Services
  3. Services provided by Amazon Web Services
  4. Here’s why you can’t use your local system for all of your data tasks
  5. How can Amazon Web Services help you?

 

What is Amazon Web Services (AWS)?

AWS is a cloud computing platform by Amazon that provides services such as Infrastructure as a Service (IaaS), platform as a service (PaaS), and packaged software as a service (SaaS) on a pay-as-you-go basis. It was launched in 2006 but was originally used to handle Amazon’s online retail operations.

AWS has 3 main products:

  1. EC2 (Amazon Elastic Compute Cloud):
    EC2 allows users to rent virtual machines/servers on which they run their own applications. These servers come in different operating systems and Amazon charges you based on the computing power and capacity of the server (i.e. Hard Drive capacity, CPU, Memory, etc.) and the duration the server been up
  2. Glacier
    Glacier is a low-cost online file storage web service. Amazon Glacier is designed for the long-term storage of inactive data that will not need to be quickly retrieved
  3. S3 (Amazon Simple Storage Services)
    S3 provides object storage through a web service interface, with scalability and high-speed being its boonAmazon Web Services

AWS provides its consumers with many advantages:

  • Security: AWS provides comprehensive security capabilities to assure the most demanding requirements
  • Compliance: AWS has rich controls, auditing, and broad security accreditation
  • Hybridism: It allows the building of hybrid architectures that extend the on-premises infrastructure to the cloud
  • Scalability: It allows scaling up and scaling down with ease
  • Pay-as-you-go: This means that you pay in accordance to the services you use. Useless, pay less. Use more, pay more but per-unit price goes down as you scale up

Here is an article that will help you begin your journey in using AWS:

 

History of Amazon Web Services (AWS)

AWS was initially launched in 2002 but it provided only a few services. In 2006, AWS launched its cloud products which included Amazon S3 cloud storage, SQS (Simple Queue Service), and EC2 and in doing so, marked its entry in the online core services industry.

In 2009, AWS saw the international expansion of AWS to Europe where S3 and EC2 were launched. Elastic Block Store (EBS), which provides block-level storage, and Amazon CloudFront, a content delivery network, were released and incorporated into AWS.

It provides block-level storage to use with Amazon EC2 instances. Amazon Elastic Block Store volumes are network-attached and remain independent from the life of an instance.

Over the years, a lot of services were added to the AWS platform which has made it a cost-effective and highly scalable platform. Now, AWS has its data centers all over the world including the United States, Japan, Europe, Australia, and Brazil.

AWS Global Infrastructure map

 

Services provided by Amazon Web Services

The following services are provided by AWS in the respective domains:

  1. Compute Services:
    • EC2 (Elastic Compute Cloud)
    • EKS (Elastic Container Service for Kubernetes)
    • Lambda
    • Amazon LightSail
    • Elastic Beanstalk
  2. Database Services:
    • Neptune
    • RDS
    • Aurora
    • RedShift
    • DynamoDB
    • ElastiCache
  3. Security Services:
    • KMS (Key Management Service)
    • AWS IAM (Identity and Access Management)
    • Inspector
    • WAF (Web Application Firewall)
    • Cloud Directory
    • Certificate Manager
    • Organizations
    • Shield
    • Macie
    • GuardDuty
  4. Storage Services:
    • Amazon Glacier
    • S3 (Simple Storage Service)
    • AWS Snowball
    • Elastic Block Store
  5. Migration Services:
    • Snowball
    • DMS (Database Migration Service)
    • SMS (Server Migration Service)
  6. Analytical Services:
    • Kinesis
    • QuickSight EMR (Elastic Map Reduce)
    • Data Pipeline
    • CloudSearch
    • Athena
    • ElasticSearch
  7. Management Tools:
    • CloudWatch
    • CloudFormation
    • CloudTrail
    • OpsWorks
    • Config
    • AWS Auto Scaling
  8. Messaging Services:
    • Pinpoint
    • SQS
    • SES
    • SNS

For more information on services provided by AWS, click here.

By now you would have a broad understanding of what AWS is. So now, let’s shed some light on why companies require their data scientists to know AWS.

 

Here’s why you can’t use your local system for all of your data tasks

Remember when you were just sitting idle waiting for the system to respond? Here, we highlight a list of problems that your local systems must be able to overcome:

  1. The system on which you deploy tasks has low processing power that will have a drag on your punctuality. You must have noticed this while processing huge volumes of data and I am pretty sure the thoughts of an external, centrally managed system must have crossed your mind
  2. Large data sets don’t fit into the IDE’s system memory which is required for analytics or model training. Remember when your Jupyter Notebook got stuck?
  3. It costs a lot both in terms of time and money to install and maintain your own hardware

 

How can Amazon Web Services help you?

I am sure many of you would be still wondering why you should use AWS? Why not go for something else (like Google’s GCP)? Let me answer this by giving the following benefits fo AWS:

  1. User Friendly

    AWS has a very well documented user interface which eradicates the requirement of on-site servers to meet the IT demands. This eases up the deployment of programs, software from time to time. AWS meets your every need.

  2. Diverse Tools

    Earlier in this article, we saw what a diverse range of services AWS has to offer. It’s the all in one solution for your IT and cloud requisites considering its efficiency.

  3.  Computing Capacity

    You don’t need to worry about whether large datasets will fit into your IDE’s system memory or not.

  4. Infrastructure

    The AWS Global Cloud Infrastructure is the most extensive, and reliable cloud platform, offering over 175 fully-featured services from data centers globally. Whether you need to deploy your application workloads across the globe in a single click, or you want to build and deploy specific applications closer to your end-users with single-digit millisecond latency, AWS provides you the cloud infrastructure where and when you need it easily.

  5. Pricing

I sense this will act as the most convincing points! AWS is one of the cheapest platforms for cloud servicing. This is really useful for small businesses to function and grow without allocating much working capital on servers.

CIPS_MQ_transparent

2020 Gartner Magic Quadrant for Cloud Infrastructure and Platform Services

 

Why do companies emphasize on AWS knowledge for their data scientists?

Whichever firm you work for, cloud infrastructure will become an important part of your daily data science regime because companies have become more inclined towards cloud computing for solutions.

According to a report from Indeed.com, AWS rose from a 2.7% share in tech skills in 2014 to 14.2% in 2019. That’s a 418% change!

This is because of the pricing model on which AWS works. AWS works on a pay-as-you-go model and charges on either a per-hour or a per-second basis. It also provides an option to reserve a specific amount of computing capacity at discounted rates.

Additionally, AWS keeps in mind the prospective consumers who can’t afford its services. For them, it provides the AWS Free Tier service which allows them to gain hands-on experience with AWS services absolutely free.

All businesses, whether big or small, want to save costs. Small companies save costs of buying servers and conglomerates gain authenticity and productivity. AWS services are also very powerful. On one hand, where it takes days to set up a Hadoop cluster with Spark, AWS does it within a few minutes.

 

End Notes

In today’s competitive world, having hands-on experience with cloud services like AWS gives a great lead in the data science race. AWS is now very popular among businesses and your experience with such cloud computing platforms highlights your skills during the recruitment process.

Here are some additional resources that you should look into:

I hope this article serves as a solid argument supporting why cloud computing is necessary for data scientists. Please use the comment section below if any thoughts to share or general queries.

MUKUL850 28 Sep, 2020

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Colin Goyette
Colin Goyette 30 Sep, 2020

I cannot disagree with this opinion article more. People spend their entire careers studying and administering systems of IT infrastructure. Others spend their entire careers studying mathematics, statistical methods, theory of computer science, and areas of domain expertise. Arguing that every data science professional needs to also have professional level sysadmin or devops skillsets implies that building valuable and differentiated models also requires operating and maintaining the underlying machinery for the tasks that they alone are qualified for: studying data closely, performing experiments efficiently, and documenting their research clearly. Requiring these skillsets in parallel implies cultures of heroism, overwork, and a lack of collaborative culture. Simply put: let scientists be scientists.

Praveen Kumar M
Praveen Kumar M 30 Sep, 2020

Thanks, gives a quick background of AWS.

Jagannath Bulusu
Jagannath Bulusu 02 Oct, 2020

Good article to understand AWS overview.

Amazon SP API
Amazon SP API 08 Feb, 2023

Good overview of AWS in this post.