Getting started with an AI automation project: Some pointers for an MBA student
A student from my AI, Business, and Society class recently asked for guidance on creating a personal project to automatically curate and share memories from Google Photos to Instagram. Just a few years ago, this project would have required significant coding expertise, familiarity with multiple APIs, and weeks of development time. Today, it's accessible to anyone with curiosity and willingness to experiment.
AI tools now enable creative individuals to build practical applications with minimal coding background. This approach lets you focus primarily on the vision and architecture while AI assists with implementation details.
In this post, I'm sharing advice I gave to this student about getting started on such a project. The guidance applies broadly to many personal automation projects. It's about how to leveraging AI to bridge the gap between your ideas and their technical implementation.
Leave a comment if you find this helpful!
Project Overview
This project sits at the intersection of programming and modern AI capabilities. While the entire project will require code, large language models (LLMs) can assist you throughout the entire development process in two key ways:
Coding assistance - LLMs can help write, explain, and debug virtually all the code you'll need
Direct functionality - LLMs can analyze your photos, understand content, and help create reel concepts
This means you won't have to write code entirely on your own, even for the technical components. Let me break down how I'd approach this:
Getting Started: The Foundation
If you're new to coding, I recommend starting with the "AI Python for Beginners" short course from deeplearning.ai: https://www.deeplearning.ai/short-courses/ai-python-for-beginners/
This class will give you an idea of coding and also of using large language models through APIs. This course might not teach you how to use Claude's API specifically, but you can easily find information about it online. Claude itself can help you produce API code for your project.
For this particular project, you can likely get started with just this short course since you can begin simply and learn as you go. It's not a bad idea to just play around with the code and APIs, using AI tools to help you along the way.
If you prefer more structured learning, I recommend the "Generative AI for Software Development" specialization on Coursera: https://www.deeplearning.ai/courses/generative-ai-for-software-development/
This specialization goes into more depth and will give you more building blocks for this project, though at the cost of a longer time commitment before you can actually start working on your project.
Technical Components
Remember our lecture on Software Engineer Workflow? This project will require all three core tasks we discussed:
Architectural decisions - Designing the overall system (LLMs can help, but human guidance is crucial)
Coding - Writing the actual implementation (LLMs excel here)
Debugging - Testing and fixing issues (LLMs are valuable for this)
For a prototype project like yours, LLMs can be especially helpful with architecture since optimization isn't critical at this stage. I'd recommend using Claude models rather than ChatGPT for this project, as Claude excels at coding tasks. The recent release of Claude 3.7 Sonnet makes this perfect timing!
Architecture Suggestion
Here's how I'd structure the initial system:
Photo Selection Module
Connect to Google Photos API
Implement selection logic (start simple—perhaps 10 photos from a specific date range)
Consider both technical constraints and product requirements
Content Creation Module
Pass selected photos to an LLM along with your business logic prompt
The prompt should define what kind of reel you want to create
Allow for user editing of the prompt (can be implemented later)
This is the most creative part. What kind of reels do you want to design? What are the templates you want a large language model to consider? How do you want the template to depend on what kinds of photos it sees? There are so many exciting possibilities here that I encourage you to explore.
Publishing Module
Translate the creation instructions into Instagram API calls
Provide the LLM with API documentation to understand possibilities and limitations
Development Approach
Start small and iterate:
Begin with a specific Google Photos album containing just 10 photos
Get the basic workflow functioning end-to-end
Once you can successfully publish a simple reel, expand functionality
Address scaling concerns later (handling large albums, refining selection criteria, etc.)
I like experimenting when I'm designing LLM components. I personally use Claude's Workbench, which has more functionality than the classic Claude interface. It also gives you the API command for actions you perform through the interface. For your project, I would experiment by uploading 10 specific pictures you're interested in and trying different prompts that produce reel descriptions. Once you're happy with the results, you can get the API code and incorporate it into your app where you'll get photos from the previous steps and use the description of the reel to create the actual reel using the Instagram API.
https://console.anthropic.com/workbench
Resources
The API links you shared are excellent starting points:
Note that in many cases it can be valuable to paste parts of documentation or the whole documentation directly into a large language model, especially if documentation gets updated after the model's training data.
Next Steps
Take the Python with AI class if you don't have prior coding experience
Set up your development environment
Create a simple script to authenticate and access your Google Photos
Experiment with photo selection criteria
Design your first reel template
Implement the Instagram publishing workflow

