Daniel Otero

AI Engineer · Data Scientist · Applied Researcher

I build production systems across three fronts — agentic AI (LangGraph, RAG, multi-agent), NLP and text analytics (semantic search, embeddings, bibliometric networks), and applied data science (clustering, dashboards, pipelines) — for research and product teams across Latin America.

Daniel Otero

Bridging social science and AI

I'm an economist and computer-science engineer (M.Sc.) working across three fronts: agentic AI (LLM orchestration, RAG, multi-agent systems), NLP and text analytics (semantic search, embeddings, bibliometric networks), and applied data science (clustering, statistical modeling, dashboards). My path moves between them — sometimes within a single project.

That breadth means I do the technical work and understand the social, organizational, and research context behind it. I've shipped conversational agents serving hundreds of users monthly across Latin America, ML clustering pipelines for survey research, RAG systems with vector search, and 6 monitoring dashboards across 4 countries for data-capture and impact-evaluation processes.

Currently leading data science and AI at Estudio Plural — designing LLM-based tools for behavioral research, knowledge retrieval, and organizational intelligence. I publish peer-reviewed work on bibliometric NLP, teach, and consult on applied research projects when there's a good fit.

700+
Active WhatsApp bot users / month
8
Countries reached with data systems
104K
Nodes in bibliometric citation network
4
Peer-reviewed publications

What I work with

Agentic AI & LLMs
LangChainLangGraph RAGMulti-agent Prompt EngineeringFine-tuning Hugging FaceOpenRouter
NLP & Text Analytics
EmbeddingsSemantic Search Text ClassificationSentiment Analysis Network AnalysisBibliometrics
Data Science & Stats
PythonR Pandasscikit-learn PlotlyClustering PCAStatistical Modeling
Agentic Coding Systems
Claude CodeCodex OpenCode
Infrastructure & Storage
FastAPIStreamlit Next.jsDocker GitHub Actionsn8n TwilioPostgreSQL MongoDBQdrant Neo4j

Selected projects

Production systems across three fronts — agentic AI, NLP, and data science — built for research and product teams.

Agentic NLP
puddleAsistant
Production

Semantic search over 6 educational documents on gender and parenting. MongoDB vector store + OpenAI embeddings. Multilingual WhatsApp bot with conversation memory in Supabase. 7 specialized agents, 1,544 processed chunks.

Active in CO & MX · ~300 users / month
RAGEmbeddings MongoDBSupabase TwilioMulti-agent
Data Science
SGR Dashboard
Live

Interactive dashboard for Colombia's General Royalties System (SGR). Real-time data from datos.gov.co via Socrata API, dynamic filters, choropleth maps, and Excel export. Deployed on Streamlit Cloud.

Saves 8 hours / week to the project formulation team
Streamlitpydeck PlotlySocrata API GeoJSON
Data Science
Cali Electoral Map
Live

Interactive map of Cali's 339 neighborhoods with the 2026 first-round presidential results. Official Registraduría tally (216 polling stations, 5,158 tables) geolocated to neighborhood level by cross-referencing IDESC's WFS school layer + OpenStreetMap geocoding. Leaflet, static site on Vercel.

339 neighborhoods · 1M+ official votes mapped
LeafletGeoJSON PythonShapely Open Data
Agentic Data Science
archetypeSuite
Live

End-to-end ML pipeline for archetype discovery. LangGraph orchestrates ingestion → profiling → preprocessing → algorithm selection → clustering → LLM-generated narrative. 33 automated tests passing.

Cuts survey analysis time by 50%
PythonLangGraph scikit-learnStreamlit OpenRouter
Agentic NLP
Aly — WhatsApp AI Agent
MVP

Multilingual bot (ES/EN/PT) for Equimundo's A+P Manual. 5 sequential LLM agents: language detection → intent classification → specialized response (factual, planning, ideation, sensitive topics). Built with FastAPI + LangGraph.

Active across 4 countries · ~400 users / month
FastAPILangGraph Text ClassificationMongoDB Twilio
Data Science Agentic
AMA Survey Pipeline
Active

Multi-city survey processing pipeline for social field research across 4 cities in Colombia, Peru, Ecuador, and Bolivia. KoboToolbox integration, validation, deduplication, LLM-generated reports, and 30+ charts for interim deliverables.

4 cities across 4 countries · 30+ charts auto-generated
Pythonpandas KoboToolboxLLM Social Research
Agentic NLP
convocatorias-bot
Production

Automated daily scanner of 15+ funding and grant sources. Claude AI filters by organizational relevance, deduplicates results, and sends curated alerts to Slack. Runs on GitHub Actions every morning.

Saves 10 hours / week to the project formulation team
PythonBeautifulSoup Claude AISlack API GitHub Actions
Data Science
Aly Dashboard
Live

Operational monitoring dashboard for the Aly (Apapáchar) WhatsApp bot. KPIs with sparklines and deltas, geographic visualization, alert flags with Excel export and review-status toggle, and a leaderboard with drill-down. Multi-page Streamlit app with custom navigation and i18n.

Real-time bot monitoring for the research team
StreamlitSupabase PlotlyPostgreSQL Python
Data Science Agentic
AMA Lineabase 2026
Active

Field-survey validation pipeline for the AMA program in Leticia (Colombia) and Cobija (Bolivia). KoboToolbox QC, ID validation, duration outlier detection per classroom, attendance crosschecks vs Google Forms, school-level Excel reports, and LLM-generated narrative summaries via OpenRouter.

2 cities across 2 countries · automated QC + reporting
Pythonpandas KoboToolboxOpenRouter CLI
Data Science
AMA Bot Monitoring
Live

Monitoring dashboard for the AMA WhatsApp bot. Tracks user activity, sessions, and engagement across deployments. Streamlit + Supabase backend with Plotly visualizations and Excel exports.

Real-time tracking of bot activity for the research team
StreamlitSupabase SQLAlchemyPlotly Python
Agentic
agentChatBuilder
In Dev

No-code SaaS platform for building multi-agent chatbots with multi-channel deployment (WhatsApp, Telegram, Web). Full UI in Next.js + shadcn/ui; FastAPI backend with MongoDB Atlas and Supabase auth.

Next.jsTypeScript FastAPIMongoDB Supabase

Where I've worked

Estudio Plural
  • Data Science & AI Specialist Jan 2025 – Present
  • Behavioral Research & Analytics Lead Jun 2024 – Dec 2024
  • Data Analytics Consultant Dec 2023 – May 2024
  • Data Analytics Consultant Jul 2023 – Aug 2023
  • Built 2 conversational agents with LangGraph: first deployed across 4 countries (CO/EC/PE/BO) with ~400 users/month; second active in CO & MX with ~300 users/month.
  • Real-time monitoring dashboards with leaderboards and automatic report generation, giving researchers immediate access to bot usage metrics with no manual extraction.
  • Automation flows in n8n and Zapier for admin and accounting → 80% time saved on repetitive tasks.
  • AI system for automatic detection of funding opportunities → 10 hours/week saved for the project formulation team.
  • Python data pipelines connecting KoboToolbox & Typeform to Supabase dashboards → ~90% reduction in field-data monitoring time.
  • Multi-agent processing system for clustering and behavioral narrative generation over survey data → 50% reduction in analysis time.
Octopus Force
  • Project Analyst Jan 2025 – Mar 2026
  • Research Leader Jul 2023 – Dec 2024
  • Built SGR (General Royalties System) monitoring dashboard integrated with the datos.gov.co Open Data API → 8 hours/week saved for the project formulation team.
  • Developed a prompt library for technology surveillance → research time per report cut from 8 to 3 days (-63%), applied across ~20 reports for companies in Valle del Cauca.
  • Deployed intelligent agents for information synthesis and organization across research, search, and project formulation in public and corporate contexts.
  • Built MVP of a multi-agent assistant for document management, focused on classification and efficient access to technical and administrative documents.
Universidad del Valle, CIDSE
  • Data Analysis Consultant Nov 2024 – Dec 2024
  • Data Analytics & Experimental Design Consultant Oct 2023 – Dec 2023
  • Shiny Developer Oct 2020 – Dec 2020
  • Sample design, construction and deployment of experimental surveys in oTree, with results processing using clustering algorithms for computational social science projects.
  • Narrative and social-network analysis using NLP and text mining in R.
  • Interactive Shiny dashboards for non-technical research teams.
Tell Business Storytelling
  • Data Analytics Consultant Mar 2024 – Jun 2024
  • Data Scientist Mar 2020 – Dec 2021
  • Designed and analyzed an end-to-end Typeform survey including final report → 50% time reduction vs. previous process.
  • Automated survey processing and report generation across 6 countries (CL, SV, CO, MX, UY, PE) → 90% time saved for the research team.
  • Built data capture system via Twitter API + Google Trends/News → weekly collection cut from 2 days to 20 minutes.
  • NLP and text mining pipelines for sentiment analysis, clustering, and user-persona construction over the captured data.
Universidad del Valle, CINARA Institute
  • Quantitative Analytics Lead · PUDA2022 Project Sep 2021 – Jun 2023
  • Sample design and construction of socio-environmental surveys; processed results applying PCA and clustering for data characterization in water and sanitation contexts.
  • Network models and fuzzy logic systems applied to complex socio-environmental systems.
Fundación Univalle
  • Advisor Sep 2020 – Nov 2020
  • Applied clustering and PCA over SISBEN IV data for categorization and georeferencing of vulnerable population, with technical reporting.
Directrix Analytics
  • Data Scientist · HORIZONT Project Nov 2018 – Apr 2019
  • Automated industrial sensor data capture from a kiln at the ARGOS Yumbo (Valle del Cauca) plant, enabling continuous monitoring of process variables.
  • PCA and predictive models on the captured data; visualization dashboards for plant teams.
CIDSE, Universidad del Valle
  • Research Assistant Jan 2017 – Jul 2020
  • Automated download of scientific citation data from the RePEc API, building a network of 104,589 nodes6 months of manual work saved.
  • Built semantic models and citation networks in R and Python for bibliometric and influence analysis in economics.
  • Co-author of 4 peer-reviewed publications in international journals (see Publications).

Publications

Bibliometric NLP and citation-network analysis applied to economic discourse — 104K+ nodes across four peer-reviewed studies.

The Drifting Influence of Hall's Random Walk Hypothesis on Consumption Modeling
García, C., Otero, D. & Salazar, B. · History of Political Economy, 55(1), 103–143 · 2023
doi.org/10.1215/00182702-10213653 ↗
A Tale of a Tool: The Impact of Sims's Vector Autoregressions on Macroeconometrics
Salazar, B. & Otero, D. · History of Political Economy, 51(3), 557–578 · 2019
doi.org/10.1215/00182702-7551924 ↗
La revolución empírica en economía
Salazar, B. & Otero, D. · Apuntes del CENES, 38(68) · 2019
doi.org/10.19053/01203053.v38.n68.2019.8792 ↗
La revolución de los nuevos clásicos: redes, influencia y metodología
Salazar, B. & Otero, D. · Revista de Economía Institucional, 17(32), 39–69 · 2015
doi.org/10.18601/01245996.v17n32.02 ↗

Let's work together

Open to consulting, research collaborations, and new projects — especially where AI, data, and social impact intersect.