The Hippie Magazine ETL (Extract, Transform, and Load) application is an Amazon Web Services based Python application. In a nutshell, the application is an AWS Lambda function that extracts subscription information from emails with a specific subject line that are received in the company’s Gmail account, transforms and cleans the email data, then uploads the data into a Google Sheets spreadsheet. This information is used to keep track of customer subscription and magazine order information.
Hippie Magazine was created to provide content that solely focuses on the hippie movement. A new Issue is published every three months with the goal to encourage its readers to embrace life for all it has to offer, while providing useful and thoughtful information that can easily be implemented into our every-day routine to make the world a better place. Hippie Magazine was first thought of back in December 2020, and within one month, the first Issue was launched.
The application itself follows a Data Engineering concept known as ETL. ETL’s are broken down into 3 separate stages: extract, transform and load. As you could imagine, the extract stage is where your applications pulls the data from some source, the transform stage is where you clean and format your data, and the final loading stage is where you upload your data (into a data warehouse, data lake, etc.) The application is completely written in Python 3.8, it utilizes Google’s Gmail API to pull customer information from PayPal emails with customer subscription information, we are also extracting cancelled subscriptions as well. We are able to extract the contents of the email such as customer names, customer addresses, order dates and more using regular expressions and a Python library called Beautiful Soup. The following extracted content is then placed into a key/value format which is then converted into a Pandas Dataframe. Pandas is a powerful and easy to use open-source data analysis and manipulation package for Python, for this application I’m mainly using it to clean and format the extracted data. Loading this code into AWS Lambda was actually a fairly simple process. All that was required was to zip the contents of the code as well as any packages that were utilized (Pandas, Beautiful Soup, etc.) and to load that zipped file into an AWS s3 bucket so that the code can be used by the Lambda function.
An immediate improvement that I would add to the application is the ability for the Lambda function to be automatically triggered with no human intervention needed. This can actually be done quite easily by creating an AWS EventBridge rule using a cron expression to trigger the function run at a specific time each day. Unfortunately, due to the existing permissions of my AWS account I was unable to automate the Lambda function.
This pandemic caused by COVID19 has truly changed our world. Not only have we unfortunately lost many lives due to this horrendous virus, but the virus has also caused many of us to change the way we live our lives. Going out to eat and meeting people just is not the same, similarly many of us have needed to convert into a virtual lifestyle. Due to COVID19 and the pandemic it has caused, many of us throughout the world have needed to readjust our ways, personally I have readjusted to a completely virtual lifestyle. This was definitely something I was not used to; I hadn’t even taken an online class before all of this! Even though it was an abnormal time where the majority of it was spent not knowing what would happen next, Cal State Channel Islands provided stability for myself and many other students. The transition from being in person to virtual could not have been more seamless, and constantly being updated truly helped alleviate a ton of stress during that time. During the height of the pandemic where everything was basically shut down, and I found myself with more free time than I ever had in years mainly due to the fact I was let go from my job due to the unforeseen circumstances. Knowing that I should make use of all of this free, I took the initiative to focus on learning more about Python and Amazon Web Services. Being a student, I had already briefly gone over these two areas of Information Technology through the various courses I have taken throughout years; however, I never truly grasped how powerful cloud computing services could be in conjunction with everyday programming languages. Although COVID19 has been a horrible time for myself and many others, it allowed me to take some time to learn new skills which have significantly influenced my capstone project.