Now you can easily start to build some dashboards.
Ben Snivelyis a Public Sector Specialist Solutions Architect.He works with government, nonprofit and education customers on big data and analytical projects, helping them build solutions using AWS. In his spare time he adds IoT sensors throughout his house and runs analytics on it.
Now you can re and see the Entities, Sentiment over time, and translated tweets.
We added advanced machine learning ML services to our flow, through some calls within AWS Lambda, and we built a multilingual analytics dashboard with Amazon QuickSight. We have also saved all the data to Amazon S so, if we want, we can do other analytics on the data using AmazonEMR, AmazonSageMaker, AmazonElasticsearchService, or other AWS services.
You appear to be visiting from China. Please navigate to our optimized website .
In addition to building a social media dashboard, we want to capture both the raw and enriched datasets and durably store them in a data lake. This allows data analysts to quickly and easily perform new s of analytics and machine learning on this data.
CREATE EXTERNAL TABLE eets coordinates STRUCT STRING, coordinates ARRAY DOUBLE , retweeted BOOLEAN, source STRING, entities STRUCT hashtags ARRAY STRUCT STRING, indices ARRAY BIGINT , s ARRAY STRUCT STRING, expanded_ STRING, _ STRING, indices ARRAY BIGINT , reply_count BIGINT, vorite_count BIGINT, geo STRUCT STRING, coordinates ARRAY DOUBLE , id_str STRING, timestamp_ms BIGINT, truncated BOOLEAN, STRING, retweet_count BIGINT, id BIGINT, possibly_sensitive BOOLEAN, filter_level STRING, created_at STRING, place STRUCT id STRING, STRING, place_ STRING, name STRING, full_name STRING, country_code STRING, country STRING, bounding_box STRUCT STRING, coordinates ARRAY ARRAY ARRAY FLOAT , vorited BOOLEAN, lang STRING, in_reply_to_screen_name STRING, is_quote_status BOOLEAN, in_reply_to_user_id_str STRING, user STRUCT id BIGINT, id_str STRING, name STRING, screen_name STRING, location STRING, STRING, description STRING, translator_ STRING, protected BOOLEAN, verified BOOLEAN, followers_count BIGINT, friends_count BIGINT, listed_count BIGINT, vourites_count BIGINT, statuses_count BIGINT, created_at STRING, utc_offset BIGINT, time_zone STRING, geo_enabled BOOLEAN, lang STRING, contributors_enabled BOOLEAN, is_translator BOOLEAN, profile__ STRING, profile STRING, profile_s STRING, profile__tile BOOLEAN, profile_link_ STRING, profile_sidebar_border_ STRING, profile_sidebar_fill_ STRING, profile__ STRING, profile_use__ BOOLEAN, profile__ STRING, profiles STRING, profile_banner_ STRING, deult_profile BOOLEAN, deult_profile_ BOOLEAN , quote_count BIGINT ROW FORMAT SERDE org.openx.data.jsonserde.JsonSerDe LOCATION
When the launch is finished, youll see a set of outputs that well use throughout this blog post
Build a set of dashboards using Amazon QuickSight.
Delete the CloudFormation stack ensure that the S bucket is empty prior to deleting the stack.
Note You might need to adjust the column widths in the Table view based on your screen resolution to see the last column.
CREATE EXTERNAL TABLE socialanalyticsblog.tweet_entities tweetid BIGINT, entity STRING, STRING, score DOUBLE ROW FORMAT SERDE org.openx.data.jsonserde.JsonSerDe LOCATION
The entire processing, analytics, and machine learning pipeline starting with Amazon Kinesis, analyzing the data using Amazon Translate to translate tweets between languages, using Amazon Comprehend to perform sentiment analysis and QuickSight to create the dashboards was built without spinning up any servers.
In this blog post well show you how you can useAmazon TranslateAmazon Comprehend, Amazon Kinesis, Amazon Athena, and Amazon QuickSight to build a naturallanguageprocessing NLPpowered social media dashboard for tweets.
Drop le socialanayticsblog.tweet_sentiments.
Leverage separate Kinesis data delivery streams within Amazon Kinesis Data Firehose to write the analyzed data back to the data lake.
Trigger AWS Lambda to analyze the tweets using Amazon Translate and Amazon Comprehend, two fully managed services from AWS. With only a few lines of code, these services will allow us to translate between languages and perform natural language processing NLP on the tweets.
Lets now pull positive tweets and see their scores from sentiment analysis
This will create a tweets le. Next well do the same and create the entities and sentiment les. It is important to update both of these with the actual paths listed in your CloudFormation output.
In the CloudFormation console, you can acknowledge by checking the boxes to allow AWS CloudFormation to create IAM resources and resource with custom name. The CloudFormation template uses serverless transforms. Choose Create Change Set to check the resources that the transforms add, then choose Execute.
You can expand on this dashboard, and build analyses such as this one
Following least privilege patterns, the IAM role that the Lambda function has been assigned only has access to the S bucket that the CloudFormation template created.
You can run queries to investigate the data you are collecting. Lets first look at the les themselves.
Social media interactions between organizations and customers deepen brand awareness. These conversations are a lowcost way to acquire leads, improve website traffic, develop customer relationships, and improve customer service.
Stop the Twitter stream reader if you still have it running.
You can build multiple dashboards, zoom in and out of them, and see the data in different ways. For example, the following is a geospatial chart of the sentiment
Viral Desai is a Solutions Architect with AWS.He provides architectural guidance to help customers achieve success in the cloud. In his spare time, Viral enjoys playing tennis and spending time with mily.
In the AWS Management Console,launch the CloudFormation Template.
Amazon is an Equal OpportunityAffirmative Action Employer Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation.
Note Refer to the Amazon EC documentation for details on how to connect either from a Windows or Mac/Linux machine.
Most of this has been set up already by the CloudFormation stack, although we will have you add the S notification so that the Lambda function is invoked when new tweets are written to S
Additionally, you can modify which terms and languages will be pulled from the Twitter streaming API. This lambdaimplementation calls Comprehend for each tweet.If youd like to modify the terms to something that may retrieve tens or hundreds of tweets a second, please look at performing batch calls or leveraging AWS Glue with triggers to perform batch processing versus stream processing.
Delete the S bucket that the CloudFormation template created.
First run this command replacing the path highlighted in the following example to create the entities le
Drop le socialanalyticsblog.tweet_entities.
After the CloudFormation stack launch is completed, go to the outputs for direct links and information. Then click the LambdaFunctionConsoleURL link to launch directly into the Lambda function.
Leverage Amazon Athena to query the data stored in Amazon S.
The only server used in the example is outside the actual ingestion flow from Kinesis Data Firehose. Its used to collect the tweets from Twitter and push them into Kinesis Data Firehose. In a future post, well show you how you can shift this component to also be serverless.
The Lambda function calls Amazon Translate and Amazon Comprehend to perform language translation and natural language processing NLP on tweets. The function uses Amazon Kinesis to write the analyzed data to Amazon S.
After running these four statements and replacing the locations for the create le statements, you should be able to select the socialanalyticsblog daase in the dropdown list and see the three les
After a few minutes, you should be able to see the various datasets in the S bucket that the CloudFormation template created
Note At the time of this blog post, Amazon Translate is in preview. If you do not have access to Amazon Translate, only include the en English value.
In this blog post, well build a serverless data processing and machine learning ML pipeline that provides a multilingual social media dashboard of tweets within Amazon QuickSight. Well leverage APIdriven ML services that allow developers to easily add intelligence to any application, such as computer vision, speech, language analysis, and chatbot functionality simply by calling a highly available, scalable, and secure endpoint. These building blocks will be put together with very little code, by leveraging serverless offerings within AWS. For this blog post, we will be performing language translation and natural language processing on the tweets flowing through the system.
You can also start to query the translation details. Even if I dont know the German word for shoe, I could easily do the following query
After the CloudFormation stack is launched, wait until it is complete.
Amazon Web Services AWS is a dynamic, growing business unit within Amazon. We are currently hiring Software Development Engineers, Product Managers, Account Managers, Solutions Architects, Support Engineers, System Engineers, Designers and more. Visit ourcareersto learn more.
CREATE EXTERNAL TABLE socialanalyticsblog.tweet_sentiments tweetid BIGINT, STRING, originalText STRING, sentiment STRING, sentimentPosScore DOUBLE, sentimentNegScore DOUBLE, sentimentNeuScore DOUBLE, sentimentMixedScore DOUBLE ROW FORMAT SERDE org.openx.data.jsonserde.JsonSerDe LOCATION
Give the query a name such as SocialAnalyticsBlogQuery
This saves the query and lets you see sampled data.
Note With the way I created the custom query, youll want to count the distinct tweetids as the value.
Click here to return to Amazon Web Services home
And now run this command to create the sentiments le
Lets also look at the nonEnglish tweets that have Kindle extracted through NLP
select entity, , count* cnt from socialanalyticsblog.tweet_entities where COMMERCIAL_ITEM group by entity, order by cnt desc limit
NoteAt the time of this blog post, Amazon Translate is still in preview. In production workloads, use the multilingual features of Amazon Comprehend until Amazon Translate becomes generally available GA.
Take some time to examine the rest of the code. With a few lines of code, we can call Amazon Translate to convert between Arabic, Portuguese, Spanish, French, German, English, and many other languages.
Leverage Amazon Kinesis Data Firehose to easily capture, prepare, and load realtime data streams into data stores, data warehouses, and data lakes. In this example, well use Amazon S.
Throughout this blog post, well show how you can do the following
Then configure the trigger with the new S bucket that CloudFormation created with the raw/ prefix. The event should be Object Created All.
Learn how todetect sentiments in customer reviews with Amazon Comprehend.
After you have created these resources, you can remove them by following these steps.
Note The code in this blog assumes that the language codes used by Twitter are the same as those used by Amazon Translate and Comprehend. The code could easily be expanded on, but if you are adding new labels, please confirm this assumption is kept true. Unless you also update the AWS Lambda code.
SELECT s.*, e.entity, e., e.score, ng as language, ordinates AS lon, coordinatesordinates AS lat , , placeuntry, t.timestamp_ms / AS timestamp_in_seconds, regexp_replacesource, .?, AS srcFROM socialanalyticsblog.tweets t JOIN socialanalyticsblog.tweet_sentiments s ON s.tweetid t.id JOIN socialanalyticsblog.tweet_entities e ON e.tweetid t.id
You will need to create an app on Twitter Create a consumer key API key, consumer secret key API secret, access token, and access token secret and use them as parameters in the CloudFormation stack. You can create them using thislink.
Build a social media dashboard using machine learning and BI services
Switch the data for the timestamp_in_seconds to be a date
select ts., ts.original from socialanalyticsblog.tweet_sentiments ts join socialanalyticsblog.tweets t on ts.tweetid t.id where lang de and ts. like Shoe
This starts the flow of tweets. If you want to keep the flow running, simply run it as a job. For testing, you can also keep the SSH tunnel open.
Use SSH to connect to the Amazon Linux EC instance that the CloudFormation stack created.
Note If you dont see all three prefixes If you dont see any data, check to make sure the Twitter reader is reading correctly and not creating errors. If you only see a raw prefix and not the others, check to make sure that the S trigger is set up on the Lambda function.
Weve provided you with an AWS CloudFormation template that will create all the ingestion components shown in the previous diagram,exceptfor the Amazon S notification for AWS Lambda depicted as the dotted blue line.
Suppose on the timeline, we only want to see positive/negative/mixed sentiments. The Neutral line, at least for my Twitter terms, is causing the rest not to be seen easily.
In Athena, run the following commands to create the Athena daase and les
select lang, ts., ts.original from socialanalyticsblog.tweet_sentiments ts join socialanalyticsblog.tweets t on ts.tweetid t.id where lang ! en and ts.tweetid in select distinct tweetid from tweet_entities where entity Kindle
Instead of running the Amazon EC instance that reads the Twitter firehose, you could leverage AWS Fargate to deploy that code as a container. AWS Fargate is a technology for Amazon Elastic Container Service ECS and Amazon Elastic Container Service for Kubernetes EKS that allows you to run containers without having to manage servers or clusters. With AWS Fargate, you no longer have to provision, configure, and scale clusters of virtual machines to run containers. This removes the need to choose server s, decide when to scale your clusters, or optimize cluster packing. AWS Fargate removes the need for you to interact with or think about servers or clusters. Using AWS Fargate you can focus on designing and building your applications instead of managing the infrastructure that runs them.
We are going to manually create the Amazon Athena les. This is a great place to leverage AWS Glue crawling features in your data lake architectures. The crawlers will automatically discover the data format and data s of your different datasets that live in Amazon S as well as relational daases and data warehouses. More details can be found in the documentation forCrawlers with AWS Glue.
Delete the Athena les daase socialanalyticsblog.
Click here to return to Amazon Web Services home
Lets step through adding one more visual to this analysis to show the translated tweets
The results show a tweet talking about shoes based on the translated
The same is true for adding natural language processing into the application using Amazon Comprehend. Note how easily we were able to perform the sentiment analysis and entity extraction on the tweets within the Lambda function.
IMPORTANT Replace TwitterRawLocation with what is shown as an output of the CloudFormation script
The following diagram shows both the ingest blue and query orange flows.
Part of the CloudFormation stack outputs includes an SSH command that can be used on many systems to connect to the instance.
Updated March , tocreate moreQUESTions.
Note Technically you dont have to use the fully qualified le names if the daase is selected in Athena, but I did that to limit people having issues if they didnt select the socialanalyticsblog daase first.
This will launch the CloudFormation stack automatically into the useast Region with the following settingsmachine tools Build a social media dashboard using machine learning and BI services