Icon

Twitter Avatar Workflow

The workflow hosts the Twitter Avatar 5-Series Version 1.0. It is a tool for customer avatar research. The introductory article for this tool can be found in the Medium link attached in the rubric inside the workflow.

Changelog
.. December 9th, 2022 .. Made some changes to the workflow annotation and edited the configuration instructions for the first component regarding maximum radius allowed by Twitter.

INSTALLATION INSTRUCTIONSUpon opening this workflow, you'll be prompted to install necessary extensions. Perform that, then, save the workflow and restart Knime.Once Knime restarts, you'll be prompted to install NodePit, since some of the nodes here are exclusive to NodePit. In order to install NodePit, follow the instructions below, which are taken from this link https://nodepit.com/product/nodepit/installation :1. Look at the top left corner of your Knime window. In the menu, go to File → Preferences → Install/Update → Available Software Sites2. Click on Add… then give the extension a name (NodePit), and paste this value to fill out the Location field: https://download.nodepit.com/4.7 (At the time of writing, the latest Nodepit version is 4.7. You may go to nodepit.com to find out what the latest version is.)3. Once added, save and close the preferences window, choose File → Install KNIME Extensions…, choose Work with: -- All Available Sites -- and under NodePit, select only "NodePit for KNIME" for installation. Once installed, restart KNIME when prompted to do so. Save your workflow before restarting.4. Once Knime restarts, you'll be prompted to install the Palladian extension. Follow along the popup instructions and you should be fine. If you're asked to trust an unsigned software, tick the requires box to give permission for installation.5. Save the workflow and restart Knime again.IMPORTANT ! From time to time, right-click on the components and choose Component> Update Link in order to apply recentmost changes made by the author who will periodically upgrade the current version when needs arise. Also, check out on the Knime Hub if there'schanges to this workflow by referring to the changelog in the description section of the workflow page (bottom of the page). GENERAL RUBRIC1. Tweet query: There are 3 kinds of queries:(i) brand-based query : This is where you type in the handle names of the company/brand's Twitter account. All Twitter handles start with @ . For example, to study Target or Walmart's customers, type in "@Target OR @Walmart" (without the quotation marks). The OR operator is important when you're studying more than 1 Twitter handle. Using an AND operator is not recommended because it queries only tweets which contain both handles, severely limiting the quantity of tweets you'll receive.The OR operator will query for tweets that contain either of the mentioned handles. If you're studying your own company/brand, but your Twitter followers are small, you can add yourcompetitors' handles to the query to get more results, as long as you're confident that the customer profiles are very similar. Note thatqueries of multiple Twitter handles without any operator in between will automatically use the AND operator. So, make sure to explicitly type in the OR operator in betweenthe different handles to avoid mistakes. An operator must be in capital letters as exemplified.For this type of query, the recommended amount of input data to get a sizeable number of clusters is > 50 non-overlapping coordinates and > 5 brands. Rough estimation of query time: 2 hours at least.(ii) keyword-based query : This is a query solely relying on keywords without any specification of Twitter handles. For example, if you are profiling people who are interestedin "MMORPG", your query might look something like this: MMORPG OR RPG OR "role-playing game" OR "roleplaying game" (with the proper the quotation marksto enclose all words belonging to their respective phrases, and using the OR operators to separate the elements apart.)For this type of query, the recommended amount of input data to get a sizeable number of clusters is > 50 non-overlapping coordinates and > 5 keyword variations. Rough estimation of query time: 3 hours at least.(iii) mix query : The name is self-explanatory. An example would be @Adidas AND "new shoes" to study on profiles who intend to buy new shoes from Adidas, or who have just bought new shoes from the brand. Notice that the AND operator is used here instead of OR, because it suits the purpose of the query. If an OR operator is used instead, you won't know whether the profiles tweeting "new shoes" are interested in the Adidas brand or not. Similar to the OR operator, the AND operator is also capitalized. Mix queries may also be in a slightly complex format, combining the AND, OR, and parantheses.Just like a mathematical equation, all sections within brackets are considered as one group. For example, (@adidas OR @nike) AND ("new shoes" OR "new pair of shoes") will query for tweets containing either the phrase "new shoes" or "new pair of shoes" plus either one of the 2 brands. As mentioned earlier, using an AND operator will severely limit the quantity of tweets extracted, hence it's wise to include as many variations of keywords as possible using OR in between the variations, to cast a wider net when AND is used in the query.For this type of query, it's difficult to recommend a good amount of datapoints to get a sizeable number of clusters, but you may do trial experiments and adjust accordingly. Expect to getfailed results (indicated by the red X sign beneath the component) when performing this type of query.2. The workflow feeds on geocoordinates data to specify the location of the tweets. The location of the tweets are assumed to be similar to where the twitter users live.This is a limitation of the API due to privacy matters. You will need to prepare an Excel file of a single column, and the first row of the column must contain the string valueof "Coordinates" (without the quotation marks, and with the exact spelling and initial letter C capitalized.) From the second row onwards, the cell values must begeo-coordinates with the following format: "38.889,-77.050" (wihout the quotation marks, and with a comma separating the latitude from the longitude). The coordinates must not contain any other characters other than numbers, dots as decimal points, comma as the separator and a negative sign for longitudeswith negative values. Positive values do not need a + sign in front. Make sure the column does not haverepeating coordinates, empty cells or misformated coordinates. Coordinates in the degree formats are unacceptable.3. If this is your first time using Knime, the minimum Java heap space allocation for this workflow is 10000 megabytes. This is done by altering the XMX value set in thedefault KNIME.ini file to -Xmx10000m . Please refer to the official forum at https://forum.knime.com/search?q=heap%20space to learn where to locate the Knime.ini fileand to thereafter do some alterations of the heap space allocation.4. The queries are automatically set to only extract tweets that are at most 7 days old from the time of execution. This ensures the data you'll receive are fresh. However, such setting will also mean that your results can vary significantly everyday even if your queries are exactly the same each time. This is because Twitter users are active, and the topics they talk about may differ from day to day. 5. For every 100 coordinate points, the minimum storage allocation you should expect to provide is roughly estimated to be between 1-3 Gigabytes, but it goes back to 0 once you've erased all data by resetting the workflow. The first component (Twitter Avatar - Querying) is a fully automated component that does all the query work for you. Because queries have rate limits, this makes it the most time-consuming part of this workflow, depending on how many tweets available based on the query made. Because of this, make sure that once you've started the first step, the internet connection is stable without a single interruption, and that you leave the computer alone until Step 1 is completed, to avoid crashes. Never run all components at once, and always save the workflow after Step 1 is completed before proceeding to Step 2. From Step 2 onwards, you'll be able to interact with the components rather than sitting and waiting for the results. The total time to finish Step 2 until Step 5 depends on how fast you go over the instructions in the popup windows.6. The Twitter queries are done in both V1 and V2 Twitter API query systems. For the V1 API, Knime's native Twitter nodes are used. For the V2 API, the Palladian's HTTP Retriever Node is used. All you need to do is prodive all 5 tokens for the first component to take care of the automatic querying process for you.You'll be needing an Elevated Access Twitter developer account to use this workflow due to the Twitter rate limits. Not all V2 developer accountshave this privilege, let alone V1, so you should check out whether you have this Elevated Access before proceeding. Failure to comply will result in abusing the rate limits.7. By default, this workflow is built to filter in only tweets written in the English language, relying on the operator "lang:en" which is already built in inside the tool. While other languages aside from English are not supported, you may still casually seerare cases of Tweets with non-English contaminants in the results sometimes.8. CAUTION! Security reminder: Never share this workflow to anyone once you've saved it on your computer, unless you're sharing it in a non-executed state with all 5 token fields wiped out. A saved workflow contains your data including the access tokens.For the same reason, it's not recommended to save this workflow on a shared computer. If you're sharing a computer, always reset all components after you've done exporting the PDF files through the last component in Step 5. Once you're done using the tool,here's a proper way to reset everything and erase all data:i. In the configuration window of the Step 1 component (Twitter Avatar - Querying), delete all entries in all fields.ii. Hit 'Apply', then hit 'OK'.iii. Save your workflow. As long as you follow those 3 steps in their respective order, you're good to go. While this is the most secured way to reset the workflow, please be advised that it will remove all data from all components too. That's why it's important to do it only after you've successfully exported the PDF files in Step 5.Last but not least, for first-time users of the tool, please read my Medium articles for tutorials or case studies I made using this tool on this link: https://advertstoday.medium.com/Case study requests (for people without Elevated Access) or any inquiries/feedback regarding the tool can be forwarded to my LinkedIN account at https://www.linkedin.com/in/najmi-akibi/ Twitter Avatar 5-Series Step 21. Right-click and choose'Execute and Open Views'.2. Follow the instructions in the popup window.Step 31. Right-click and choose'Execute and Open Views'.2. Follow the instructions in the popup window.3. Wait a few secondsfor the light to turn greenbefore proceeding to thenext step.Step 41. Right-click and choose'Execute and Open Views'.2. Follow the instructions in the popup window.Step 51. Right-click and choose'Execute and Open Views'.2. Follow the instructions in the popup window.You're all set ! By now, you shouldhave exported the PDF reportsto your computer.You may now chooseeither to save the workflow or follow the instructions in the rubric to erase all API tokens and reset theworkflow.Step 11. Right-click & choose 'Configure'.2. Follow the instructions in the window that pops out.3. Hit 'Apply'>'OK' 4. Right-click on this component againand choose 'Execute'.5. Trust the process and waituntil you see a green light.6. Save this workflow beforeproceeding to the next step.Twitter Avatar(Persona Profiling) Twitter Avatar(Cluster Profiling) Twitter Avatar(Humanizing) Twitter Avatar(PDF Generator) Twitter Avatar (Querying& Clustering) INSTALLATION INSTRUCTIONSUpon opening this workflow, you'll be prompted to install necessary extensions. Perform that, then, save the workflow and restart Knime.Once Knime restarts, you'll be prompted to install NodePit, since some of the nodes here are exclusive to NodePit. In order to install NodePit, follow the instructions below, which are taken from this link https://nodepit.com/product/nodepit/installation :1. Look at the top left corner of your Knime window. In the menu, go to File → Preferences → Install/Update → Available Software Sites2. Click on Add… then give the extension a name (NodePit), and paste this value to fill out the Location field: https://download.nodepit.com/4.7 (At the time of writing, the latest Nodepit version is 4.7. You may go to nodepit.com to find out what the latest version is.)3. Once added, save and close the preferences window, choose File → Install KNIME Extensions…, choose Work with: -- All Available Sites -- and under NodePit, select only "NodePit for KNIME" for installation. Once installed, restart KNIME when prompted to do so. Save your workflow before restarting.4. Once Knime restarts, you'll be prompted to install the Palladian extension. Follow along the popup instructions and you should be fine. If you're asked to trust an unsigned software, tick the requires box to give permission for installation.5. Save the workflow and restart Knime again.IMPORTANT ! From time to time, right-click on the components and choose Component> Update Link in order to apply recentmost changes made by the author who will periodically upgrade the current version when needs arise. Also, check out on the Knime Hub if there'schanges to this workflow by referring to the changelog in the description section of the workflow page (bottom of the page). GENERAL RUBRIC1. Tweet query: There are 3 kinds of queries:(i) brand-based query : This is where you type in the handle names of the company/brand's Twitter account. All Twitter handles start with @ . For example, to study Target or Walmart's customers, type in "@Target OR @Walmart" (without the quotation marks). The OR operator is important when you're studying more than 1 Twitter handle. Using an AND operator is not recommended because it queries only tweets which contain both handles, severely limiting the quantity of tweets you'll receive.The OR operator will query for tweets that contain either of the mentioned handles. If you're studying your own company/brand, but your Twitter followers are small, you can add yourcompetitors' handles to the query to get more results, as long as you're confident that the customer profiles are very similar. Note thatqueries of multiple Twitter handles without any operator in between will automatically use the AND operator. So, make sure to explicitly type in the OR operator in betweenthe different handles to avoid mistakes. An operator must be in capital letters as exemplified.For this type of query, the recommended amount of input data to get a sizeable number of clusters is > 50 non-overlapping coordinates and > 5 brands. Rough estimation of query time: 2 hours at least.(ii) keyword-based query : This is a query solely relying on keywords without any specification of Twitter handles. For example, if you are profiling people who are interestedin "MMORPG", your query might look something like this: MMORPG OR RPG OR "role-playing game" OR "roleplaying game" (with the proper the quotation marksto enclose all words belonging to their respective phrases, and using the OR operators to separate the elements apart.)For this type of query, the recommended amount of input data to get a sizeable number of clusters is > 50 non-overlapping coordinates and > 5 keyword variations. Rough estimation of query time: 3 hours at least.(iii) mix query : The name is self-explanatory. An example would be @Adidas AND "new shoes" to study on profiles who intend to buy new shoes from Adidas, or who have just bought new shoes from the brand. Notice that the AND operator is used here instead of OR, because it suits the purpose of the query. If an OR operator is used instead, you won't know whether the profiles tweeting "new shoes" are interested in the Adidas brand or not. Similar to the OR operator, the AND operator is also capitalized. Mix queries may also be in a slightly complex format, combining the AND, OR, and parantheses.Just like a mathematical equation, all sections within brackets are considered as one group. For example, (@adidas OR @nike) AND ("new shoes" OR "new pair of shoes") will query for tweets containing either the phrase "new shoes" or "new pair of shoes" plus either one of the 2 brands. As mentioned earlier, using an AND operator will severely limit the quantity of tweets extracted, hence it's wise to include as many variations of keywords as possible using OR in between the variations, to cast a wider net when AND is used in the query.For this type of query, it's difficult to recommend a good amount of datapoints to get a sizeable number of clusters, but you may do trial experiments and adjust accordingly. Expect to getfailed results (indicated by the red X sign beneath the component) when performing this type of query.2. The workflow feeds on geocoordinates data to specify the location of the tweets. The location of the tweets are assumed to be similar to where the twitter users live.This is a limitation of the API due to privacy matters. You will need to prepare an Excel file of a single column, and the first row of the column must contain the string valueof "Coordinates" (without the quotation marks, and with the exact spelling and initial letter C capitalized.) From the second row onwards, the cell values must begeo-coordinates with the following format: "38.889,-77.050" (wihout the quotation marks, and with a comma separating the latitude from the longitude). The coordinates must not contain any other characters other than numbers, dots as decimal points, comma as the separator and a negative sign for longitudeswith negative values. Positive values do not need a + sign in front. Make sure the column does not haverepeating coordinates, empty cells or misformated coordinates. Coordinates in the degree formats are unacceptable.3. If this is your first time using Knime, the minimum Java heap space allocation for this workflow is 10000 megabytes. This is done by altering the XMX value set in thedefault KNIME.ini file to -Xmx10000m . Please refer to the official forum at https://forum.knime.com/search?q=heap%20space to learn where to locate the Knime.ini fileand to thereafter do some alterations of the heap space allocation.4. The queries are automatically set to only extract tweets that are at most 7 days old from the time of execution. This ensures the data you'll receive are fresh. However, such setting will also mean that your results can vary significantly everyday even if your queries are exactly the same each time. This is because Twitter users are active, and the topics they talk about may differ from day to day. 5. For every 100 coordinate points, the minimum storage allocation you should expect to provide is roughly estimated to be between 1-3 Gigabytes, but it goes back to 0 once you've erased all data by resetting the workflow. The first component (Twitter Avatar - Querying) is a fully automated component that does all the query work for you. Because queries have rate limits, this makes it the most time-consuming part of this workflow, depending on how many tweets available based on the query made. Because of this, make sure that once you've started the first step, the internet connection is stable without a single interruption, and that you leave the computer alone until Step 1 is completed, to avoid crashes. Never run all components at once, and always save the workflow after Step 1 is completed before proceeding to Step 2. From Step 2 onwards, you'll be able to interact with the components rather than sitting and waiting for the results. The total time to finish Step 2 until Step 5 depends on how fast you go over the instructions in the popup windows.6. The Twitter queries are done in both V1 and V2 Twitter API query systems. For the V1 API, Knime's native Twitter nodes are used. For the V2 API, the Palladian's HTTP Retriever Node is used. All you need to do is prodive all 5 tokens for the first component to take care of the automatic querying process for you.You'll be needing an Elevated Access Twitter developer account to use this workflow due to the Twitter rate limits. Not all V2 developer accountshave this privilege, let alone V1, so you should check out whether you have this Elevated Access before proceeding. Failure to comply will result in abusing the rate limits.7. By default, this workflow is built to filter in only tweets written in the English language, relying on the operator "lang:en" which is already built in inside the tool. While other languages aside from English are not supported, you may still casually seerare cases of Tweets with non-English contaminants in the results sometimes.8. CAUTION! Security reminder: Never share this workflow to anyone once you've saved it on your computer, unless you're sharing it in a non-executed state with all 5 token fields wiped out. A saved workflow contains your data including the access tokens.For the same reason, it's not recommended to save this workflow on a shared computer. If you're sharing a computer, always reset all components after you've done exporting the PDF files through the last component in Step 5. Once you're done using the tool,here's a proper way to reset everything and erase all data:i. In the configuration window of the Step 1 component (Twitter Avatar - Querying), delete all entries in all fields.ii. Hit 'Apply', then hit 'OK'.iii. Save your workflow. As long as you follow those 3 steps in their respective order, you're good to go. While this is the most secured way to reset the workflow, please be advised that it will remove all data from all components too. That's why it's important to do it only after you've successfully exported the PDF files in Step 5.Last but not least, for first-time users of the tool, please read my Medium articles for tutorials or case studies I made using this tool on this link: https://advertstoday.medium.com/Case study requests (for people without Elevated Access) or any inquiries/feedback regarding the tool can be forwarded to my LinkedIN account at https://www.linkedin.com/in/najmi-akibi/ Twitter Avatar 5-Series Step 21. Right-click and choose'Execute and Open Views'.2. Follow the instructions in the popup window.Step 31. Right-click and choose'Execute and Open Views'.2. Follow the instructions in the popup window.3. Wait a few secondsfor the light to turn greenbefore proceeding to thenext step.Step 41. Right-click and choose'Execute and Open Views'.2. Follow the instructions in the popup window.Step 51. Right-click and choose'Execute and Open Views'.2. Follow the instructions in the popup window.You're all set ! By now, you shouldhave exported the PDF reportsto your computer.You may now chooseeither to save the workflow or follow the instructions in the rubric to erase all API tokens and reset theworkflow.Step 11. Right-click & choose 'Configure'.2. Follow the instructions in the window that pops out.3. Hit 'Apply'>'OK' 4. Right-click on this component againand choose 'Execute'.5. Trust the process and waituntil you see a green light.6. Save this workflow beforeproceeding to the next step.Twitter Avatar(Persona Profiling) Twitter Avatar(Cluster Profiling) Twitter Avatar(Humanizing) Twitter Avatar(PDF Generator) Twitter Avatar (Querying& Clustering)

Nodes

Extensions

Links