Back to Blog

Making sense of social attribution

6 min readWired Sixth Intelligence
#data-engineering#aws-glue#s3#marketing-ops#quadrant-analysis

Making sense of social attribution

The attribution gap we didn't see coming

We had a marketing team watching posts go viral. Engagement numbers looked great. But when asked "Which posts actually drove revenue," we had no idea.

That is the problem with most social media dashboards: they are islands. Likes, shares, reach, sentiment, each metric lives in its own corner of the internet, and none of them talk to each other. You end up making decisions based on whatever number looks best that week.

Here's the ugly truth: our performance data sat in Looker with strict access controls. Our social sentiment lived in Sprinklr, exported as CSVs every morning. Different teams, different tools, different stories.

So we built something to bridge the gap. Not perfect automation, pragmatic workarounds that actually work in a messy enterprise environment.

Making sense of the mess: the quadrant

Before touching any code, we needed a framework marketers could actually use. We plot normalized reach against normalized mentions, and suddenly the chaos has a pattern.

What the quadrants actually mean:

  • High Performers (High reach, high mentions): Double down. These posts are your winners.
  • Platform Push (High reach, low mentions): The message is spreading but people aren't talking about it. Consider adjusting the content.
  • Leave It (Low reach, low mentions): Content isn't resonating. Kill it or fix it.
  • Underdogs (Low reach, high mentions): Your audience loves it, but not enough people see it. Amplify these.

The stack we chose

We went hybrid: serverless for heavy historical data, PostgreSQL for fast-moving cache.

ComponentWhy we picked it
AirflowPython-based orchestration, industry standard
Sprinklr exportsDaily CSVs give us cross-account mentions
StreamlitFast Python web apps stakeholders can actually use
PostgreSQLFast caching for uploaded CSVs (first stream & account reach)
Amazon S3Cheap storage for messy, multi-format data
AWS GlueETL without paying for a database we don't need
AthenaQuery layer for Sprinklr mentions data

The pattern: Sprinklr data (large, historical) goes S3 → Glue → Athena. Uploaded CSVs (small, frequently refreshed) go Streamlit → PostgreSQL. The dashboard joins them on the fly.

How the Pipeline Fits Together

Getting the data in

Sprinklr data goes into S3 first, then through Glue ETL, into Athena. The scheduled CSV exports from Sprinklr land here, automated, reliable, queryable.

First Stream & Account Reach takes a different path. Marketers upload CSVs directly through Streamlit, which stores them in PostgreSQL. The data gets cached, refreshed on demand, and stays close to where people actually work.

Cataloging the chaos

AWS Glue lets us treat S3 folders like SQL tables without actually running a database. Schema-on-read means we don't pay thousands for infrastructure that changes every time a social platform tweaks their export format.

The actual logic

The heavy lifting happens when data meets. Sprinklr data lives in Athena (S3-backed). First Stream & Account Reach lives in PostgreSQL (Streamlit cache). The join happens on the fly when someone queries the dashboard.

# Join Athena (Sprinklr mentions) with PostgreSQL (cached reach data)
def correlate_intelligence():
    """
    Fetches Sprinklr mentions from Athena, cached reach from PostgreSQL.
    Joins and normalizes on the fly.
    """
    # Query Sprinklr mentions from Athena
    mentions_df = query_athena(
        "SELECT * FROM social_mentions WHERE date >= CURRENT_DATE - 7"
    )
    
    # Fetch cached reach data from PostgreSQL
    reach_df = query_postgres("SELECT * FROM account_reach_cache")
    
    # Join on date/account
    joined = mentions_df.merge(reach_df, on=['date', 'account'])
    
    # Normalize and categorize
    norm_reach = joined['reach'] / joined['reach'].max()
    norm_mentions = joined['mentions'] / joined['mentions'].max()
    
    # Categorize based on median performance
    if norm_reach > median_reach:
        return "High Performers" if norm_mentions > median_mentions else "Platform Push"
    else:
        return "Leave It" if norm_mentions > median_mentions else "Underdogs"

The key insight: We don't move all the data into one place. We query where it lives and join on the fly. PostgreSQL handles the fast-changing reach data; Athena handles the historical Sprinklr mentions. The dashboard stitches them together when you need them.

Streamlit makes it actionable

This is where it clicks. Our Streamlit dashboard is designed for action. Marketers hover over dots and immediately see which campaign gets more budget versus which one needs a PR fix.

Upload flow: CSVs uploaded through Streamlit land in PostgreSQL, cached for fast queries. The cache refreshes when needed, not on a rigid schedule, so teams can upload fresh data whenever they want.

Query flow: When someone opens the dashboard, it queries Athena for Sprinklr mentions and pulls from PostgreSQL for cached reach. The join happens in real-time, and results populate instantly.

What we found

High Performers (High reach, high mentions): These are your winners. Pour more fuel on the fire.

Platform Push (High reach, low mentions): The message is spreading but people aren't talking about it. Consider adjusting the content.

Leave It (Low reach, low mentions): Content isn't resonating. Kill it or fix it.

Underdogs (Low reach, high mentions): Your audience loves it, but not enough people see it. Amplify these.

What changed

MetricBeforeAfter
Response timeDays of manual reconciliationNear real-time
Data ownershipVendor lock-inOwned logic
Team alignmentSiloed teams, different storiesOne view, one conversation
DecisionsGut feeling, vanity metricsCorrelation-based, data-backed

What we learned

  1. Solve for politics first. Perfect automation is a myth in enterprise settings. Build for the "semi-automated" reality where humans still click buttons and email CSVs.

  2. Own your logic. Don't let some BI tool decide what "successful" looks like. Define your own quadrants based on what actually matters to your business.

  3. Keep it serverless. S3, Glue, and Athena give you a stack that's cheap to run and scales when you need it. No database maintenance. No hardware upgrades.

The point

This pipeline turns social media data from isolated metrics into actual insights. Marketing teams stop guessing and start knowing.

The goal isn't just data. It's clarity that moves the needle.

Thanks

This project started with late-night brainstorms about marketing bottlenecks. Haikal (Head of Digital) pushed us to bridge the attribution gap. Jerry and the social media team gave us practical feedback that shaped every decision. Bernice, our intern, worked alongside us to build and refine the pipeline.


This article is part of Wired Sixth's "Intel" series on data strategy and engineering.