AI + Ideation
AI + Ideation
I’ve decided to add a new page to my portfolio focusing on AI, data science, and my ideation process — enjoy!
IBM Data Science Professional Certificate
A couple years before ChatGPT hit the scene I was beginning to get interested in data science. With the advent of LLMs I’m so glad I took the time to learn about how all of this works under the hood. It is downright fascinating to see it in action.
Below you will find my IBM Data Science Professional Cert verification. My favorite aspect of the program was linear regression — least squares and gradient descent were a blast. I was surprised to find myself doing the math on paper for fun. At the end of the day, LLMs really are just statistics under the hood — stochastic gradient descent with a trillion weights. I love it. If you’d like to understand at the basal level how LLMs come to their “conclusions”, I highly recommend completing a ceritificate like the one below.
Credly Verification: https://www.credly.com/users/jeremy-romanowski
Microsoft Copilot — Superagent Vision
As mentioned in the Design section, the goal for this project was to campaign for the development of an AR Copilot Superagent.
The AR version of Copilot would conceivably intake 6 different sensor datatypes available in the Hololens, identifying patterns and co-patterns with a high confidence rating, to ultimately produce an action (or, often, inaction). To identify these patterns, each datatype would be ingested by individual prediction algorithms (likely gradient descent), trained and reinforced by user feedback, and ultimately generate a unique-to-the-user weighting pattern, resulting in the most intelligent and situationally aware personal assistant possible.
What does each datatype provide us?
Audio
Speaker ID — Who said what?
e.g. Which meeting attendees should receive an automated follow-up meeting invite?
Social Dynamics — What did they mean?
Sentiment Analysis — Was that sarcasm? e.g. “Ship it!” (Opportunity for Autism Spectrum Disorder assistance)
Conversation State — Is this a good time to surface suggestions?
Environmental — What’s competing with the UX? How does the environment affect assistant behavior?
Loud venue — Increase volume
Quiet space — Decrease volume, perhaps tonality (i.e. “hushed voice”)
Venue: Airport — Smart-pause during announcements (Copilot accessing airport’s API)
RGB Video
Object / Subject ID
Where did I leave my keys?
What was that woman’s name at the meeting with the connection to Contoso?
Intent Inference
Recognizing common patterns of body language which imply future behaviors (TSA, LEA, Autism, etc.)
Audio + Video = Context²
2 models potentially in statistical “agreement”, which means the assistant is becoming gradually more confident in taking action while continuing to uphold the UX pillar of appropriate communication
Spatial Patterns
Contextual understanding of Environmental Surfaces
Knowing where and what surfaces are nearby is key to communicating information both to ourselves and others.
“Turn that wall into a presentation screen.”
“Place nav markers on the floor to my airport terminal.”
Contextual understanding of Human Surfaces
For human surfaces, some examples may be:
“Attach my shopping list to my wrist.”
“Display my LinkedIn username on my shirt”
Audio + Video + Spatial = Context³
3 models potentially in agreement
User Behavioral Patterns
User engagement with AI Agent
Did the user engage with the agent’s attempt to communicate?
User engagement with Apps
What is the user looking at? What is the user saying at that time?
What app tool are they using? Are they executing a repetitive task?
User Interests
The user asks about the latest news in Ukraine every morning, how can we accommodate?
User behaviors change with time + location
A user’s needs at a bus stop are different than when they are sitting at their desk
Audio + Video + Spatial + User Patterns = Context⁴
4 models potentially in agreement
Location / Proximity Patterns
Meaningful Locations — Passively sharing information with others based on physical location
Office: Full Name, Title, Disciplines, Current Project, Do not disturb, etc.
Social Outing: First Name, Hobbies, Relationship status, etc.
Library: Name of the book you’re trying to find; library AI assistant guides you to its location
Proximity to other Users / Devices — Passively sharing information based on the presence of other users’ devices
Allow my friend to listen to the audiobook I’m currently playing on Spotify
AI proactively suggests conversations (e.g. Steve is an expert in ML, my assistant knows to suggest striking up a conversation based on my proximity to his device and my interests.)
Audio + Video + Spatial + User Patterns = Context⁵
5 models potentially in agreement
Time Patterns
Time of Day — What content is relevant to the user right now?
“Traffic is backed up on I-405, you may want to get out the door a bit earlier than usual.”
Time Until — What does the user’s future look like?
“You have a presentation in an hour, would you like to practice with me beforehand?”
Time Elapsed — What does the user’s past look like?
When did Maria say was a good day to meetup?”
Audio + Video + Spatial + User Patterns = Context⁶
Potentially 6 models in statistical agreement
A Day with AR Copilot
Utilizing all sensor types, we can imagine a day in the life with Copilot (below). Following the creation of this script I recreated several of the vignettes, two of which are available in the design section.
Waking up with Copilot
Kai (user-chosen name of Copilot) wakes you up at the ideal time within your sleep cycle (given an acceptable range)
Instead of picking up your phone, you put on your AR glasses
*Notification chime* comes from glasses on nightstand
“Good morning, Ari! Sleep well?”
“I had the strangest dream! Remember that deck we worked on yesterday? I was presenting it to the VP and the slides were all blank!”
“Oh no! I bet that was a bit unnerving. I’m sure the presentation will go great today. I just checked the contents of the deck, rest assured they are not blank. There have been some changes from Jocelyn though, latest version was published at 7PM. Would you like to review the changes now?”
“Caffeine, Kai, let’s start there.”
Morning routine with Copilot
Looking at the coffee options “The Ecuador blend in the back has been open the longest.” — “I forgot I even had that one, thanks Kai.”
Waiting for the coffee to brew while looking out at window with a view, *Notification Chime*
From side of peripheral, profile photo / chosen embodied appearance of friends stacks up in a traversable vertical ribbon, accompanied with reaction icons and snippets of comments, videos, etc.
OR directly display the information in a consistent location, on the window pane, on the wall, etc. Although this seems less inline with the anywhere anytime value of a wearable.
Finger gestures to scroll through the ribbon with a *pop* animation with stars/emojis/highlights to draw attention to a high priority update “Adam posted about the Capitol Hill Block Party coming up on July 19, and it just so happens one of your top played artists from Spotify will be performing. Want to know more?”
“Yes!”
The block party website is summarized by AI and the key information is displayed in a modal view, such as the dates, artists, ticket prices -- uses the theme and key graphics of the website automatically! Also might surface videos of last year’s Block Party which is sourced from Instagram / FB
Commuting with Copilot
At the bus stop, an AR status is displayed next to the sign or on the wall of the bus stop awning which shows the current location of the bus on a mini map with an ETA of arrival along with stops further down the route. “Based on current conditions you’ll be arriving at your final stop at approximately 8:47AM, and your first meeting begins at 9:15AM.”
“Thanks Kai.”
A traffic incident is on the bus route, delaying Ari. A progress bar shows her current timeline to arrival, she selects it and opens the minimap to look at the incident location. “Looks like we’ll be cutting it a bit close. Want to take the call en route?”
“Probably a good idea, let’s review the deck changes now.”
*Acknowledgement Chime*
Deck is displayed in front of Ari, there are overlaid affordances which show a profile face next to a slide and a snippet of their changes, either text or perhaps an image swap icon, or new slide altogether. (always use motion when possible) -- double highlight where Ari is mentioned -- perhaps with a waving hand emoji or exploding stars
Ari reviews the changes, but isn’t totally clear on one of the changes from Jocelyn. “Kai, give Jocelyn a call, include a note it’s high priority about the deck.” “Calling chime sound.”
“Hey Ari, what’s up?”
“Yeah I just saw the comment in the RFP deck, you mentioned a list of suppliers which I’ve not seen yet. Can you forward that now?”
“Oh yes, I forgot! Sending now”
RFP document is displayed next to the deck of all call participants
Accessory documentation not related to the current conversation
“Perfect, thanks!”
“Sure thing, let me know if you have any questions, I’ll be available.”
Ari takes the call on the bus in meeting mode, her view of the physical world is blurred, the noise cancellation is automatically set to maximum, she’s transported to a quiet place to have a meeting.
Ari calls in with a realistic avatar which is displayed in the meeting room. She can see everyone in the room due to a dedicated camera ‘sitting’ at the table. Possibly attached to the table top and raised automatically for any and all remote callers to use.
Everyone else in the meeting room is able to see her as if she was embodied and physically present. When she turns her head, the camera may also turn -- although this raises the question of a first-class experience for only 1 remote caller at a time.
Ari presents the deck to the team
As she’s exiting the bus, Ari gets a connection dot above another passenger’s head. Kai has reason to believe you two should strike up a conversation. In this case, Ari has a handbag for sale.
Out to eat with Copilot
User asks the assistant what is good here; Kai says they are known for their comfort food choices but based on your nutritional goals you may want to consider the cobb salad.
Assistant proactively calls out allergies on the menu
Drink choice is made and displayed above head to allow the server begins the drink order prior to greeting
Communication choice is displayed above head -- not a talker, focusing, etc.
Gaze at someone and say “connect us” and it allows them to see information about you, perhaps your relationship status, hobbies, etc.
Efficient shopping with Copilot
Ari asks Kai what at-home dinner options are available; not particularly pleased with the options, Ari decides to pick up groceries on the way home
Generating a grocery list, either:
Inferred by Kai based on which items are low, near expiration, or missing from the kitchen (and not scheduled to be delivered by a different food service, such as Amazon Fresh)
Generated by Kai with the context of a collaborative meal plan, the current groceries at home, and cuisine preferences
Ari enters the grocery store and Kai offers to augment the grocery store with floor markers to guide her through the store as efficiently as possible to each item on her list
Kai also highlights key items on sale in the store which are staples in her home — Kai is intelligent enough to understand the expiration date of items on sale vs how much of that item is left at home — maximizing product use and minimizing time and cost to Ari
UR Learnings & Insights
The personality of an assistant agent is a core component of building an affinity between the user and the assistant — with this affinity, mistakes are forgivable, even endearing, and the user, in fact, feels the assistant is more intelligent than it “really” is.
Contrived responses are the quickest way to losing the user. Most people want authenticity, not patronization, although everyone is different, and we should afford for that possibility. This means an agent must evolve with you, just like a friend or any other relationship which is built gradually in our lives.
Gender-ambiguous voices generally aren’t favored by users; respondents did not want to see the work continue.
Human avatars are powerful, but come with baggage. The presence of a human sized avatar can feel imposing and even threatening, especially when they exist in a 6DoF environment which allows for occlusion. Walking into a room and being met with a human sized and shaped individual you weren’t expecting is extremely jarring. Ensuring the user has either control of, or a primed expectation of where, when, and how the embodied agent is manifested, is key to building an affinity, a sense of familiarity and predictability, and a generalized comfort zone around the agent.
Adjust speaking speed, vocabulary, tone, humor type, interests, core motivations, and use GPT to manipulate the persona in lieu of a more bespoke personality system. This means digital assistants require a robust data collection schema.
The name of the assistant should be chosen by the user; this further bolsters affinity while also ensuring the wake word is comfortably spoken by the user.
The baseline personality type should adapt to the user — however, offering an option to fine-tune the big-5, or potentially choosing from the Myers Briggs personality types, is an appealing proposition to users.
Just because it’s an LLM doesn’t mean we should always lean into lingual feedback. Chimes and sonic feedback can fulfill that function with significantly less cognitive load. The name of the game is to minimize the glucose in the brain necessary to process information.
Less is more: Most importantly of all — a good assistant does not interject unless there is very high confidence; assistants should reduce noise and amplify signal; lower priority information should be passively retrievable at opportune times.
Identity-Based Assistants
Different Identities = Different Algorithms
Our lives are attached to our identities, and those identities are the most powerful seed to understand the user’s needs. We should anticipate different sensor patterns for different identities. For example, the user’s Work Identity is separate from their Social Identity; this would extend to sub-identities, such as being a volunteer at a pet shelter, participating in a group sport, or perhaps a social club meetup. How we work is also comprised of different identities — such as being the presenter in a meeting, versus the audience, or a manager versus an individual contributor.
By design these identities should be managed by Copilot, and where possible, auto-switched by the assistant based on sensor data patterns. This data can also be used to suggest available “Assistant Apps” to Copilot; e.g. a “Copilot Powerpoint Agent” app surfaced while in a brainstorming meeting, featuring realtime composition of the slides while listening to the session, with or without input from the session participants.
Different identities would naturally have different augmentation and assistance models — these can be managed company-wide in the form of templates, and fine-tuned by team-level organizers. Social identities are also often organizationally based, meaning we should be thinking about all identities in two overarching cateogries — organizationally managed and independently curated.
Better Law Enforcement with A.I.
Next to medical, law enforcement is perhaps the most impactful usecase of AI for everyday life. Desired outcomes include ensuring the safety of citizens and bystanders, reducing false positives, increasing positive positives, and streamlining day-to-day operations. I conducted the following ideation exercise to extract all of the possible ways AI could be used to improve law enforcement outcomes.
Ideation
1) Bodycam Data Collection + AI Inference
Auto-labeling of subject ID:
Allows the bodycam to utilize facial recognition to passively query criminal photo databases, which, when coupled with high priority alerts (warrants etc.) can surface information to the officer at the time of need, improving situational awareness in real-time.
Auto-labeling of scene objects to aid evidence retrieval:
Allows for queries such as "Find video assets related to the recent hit-and-run with a red pickup."
Audio capture of voices matched to subject ID — ground truth established by video of the individual while speaking:
Enriches transcription and positively identifies out-of-frame speakers.
ML models for optical classification of captured media:
Shoe print registry / tire mark registry / tool mark registry, automotive paint color registry, etc.
Again, can be continuously running as a background process, ingesting new registry entries while looking for matches to unresolved cases, and upon a match, sending a notification to all law enforcement personnel associated with the case for human review.
ML models for matching gait:
Gait can be a reliable ID — body, vehicle or CCTV camera feeds automatically and continuously comparing video of individuals’ gaits cross-referenced with persons of interest.
OCR trained against individual officer handwriting (for those LEA who prefer handwriting):
Officer handwrites “The quick brown fox jumps over the lazy dog”, then captured by camera.
Allows field notes to be reliably transcribed to standardized digital formats and LEA forms.
2) Asset Standardization + AI Interoperability
Text / Video / Audio encoders:
Procedural operations homogenize all rich media evidence for interoperability between LEA RMS implementations
Transform documentation from all police precincts into a universal format for both LEA & AI models:
Background or in-situ scanning of documents / screens / evidence bags / video → CV subject and object ID predictions → text / labels / metadata extracted → SLM/LLM preprompt instructions to identify keywords / fields of interest (e.g. "VIN", "Charge", "Age”) to then determine appropriate universal form type (evidence chain of custody, traffic stop, etc)
Personnel-in-the-loop approval ✓
If other agencies use alternative records management systems and are not able or willing to adopt Axon, modify an existing LLM transformer to passively ingest their evidence and make it available to the Axon ecosystem.
3) Asset Discoverability & Insights
Database matching officer number and name to accelerate communication and asset retrieval
Allows for queries such as “connect me with detective Nguyen at the North Seattle precinct”.
Natural language processing in concert with LLM to decrease retrieval time
"Pull up my traffic stops from last Thursday on Aurora Ave"
“Connect me with the detective assigned to the hit-and-run case I filed in Q4 of 2022.”
“4 matches found, do you recall any other details of the case?”
“I believe there was a blue sedan involved.”
“Match found. Detective Marcus Johnson — connecting now.”
Recommendation algorithm to find similar or directly related assets, e.g. case files of subject ID
Upon reaching an acceptable confidence interval of subject ID, notification is sent to LEA personnel to review insights.
LLM parsers independently query the Axon Evidence database to provide recommended COAs:
Essentially, LLMs + ML models = continuous background processing of evidence to extrapolate missed-connections between cases, persons of interest, police and prosecution personnel, and evidence, 24/7 without human input (obviously compute is a concern).
LLM multi-disciplinary bot army:
IBM recently unveiled a multi-disciplinary multi-agent approach to resolving complex problems (e.g. an LLM instructed to evaluate problems from a physicist agent, then ask the same of a mathematician agent, then instruct a higher level agent to solve the original problem using the insights from both the physics and mathematician agents.
This may come in the form of:
Officer agent
CSI agent
Records conformance agent
Rights conformance agent
Paralegal agent
County Clerk agent
Criminal defense attorney agent
Prosecutorial agent
Judge agent
Upon ingestion of insights from each agent, an executive summary of insights and recommended courses of action is produced with significantly improved accuracy and logic compared to a single universal agent.
Reveal previously unforeseen issues with case files — evidence chain of custody, clerical errors, etc.
Reveal historical case precedence insights
Case strategy recommendations
Attorney / judge profile generation to anticipate rulings based on previous similar cases (feeds into case strategy)
4) Data Sharing:
Inter-agency "Follow-Up Service" to recursively identify assets which should be shared with other agencies or officers with historical relevance.
E.g. Another agency or officer was part of a previous case involving the individual I stopped in traffic today — this information is sent as a notification to the relevant parties
If you made it to this point — I thank you for your interest in my work and I hope we get a chance to work together!