I mean, coding sometimes feels like magic, so maybe believe it's so?
I have been wanting to write this blog post for a while now, both as a public document for others to reference and for myself to serve as a form of documentation, but have always... not done it. Better late than never, I guess. 😂
Every now and then, I have to correct an incorrectly recorded prompt word on vss365today.com. Sometimes, I see tweets that refer to the site as the source of the #vss365 writing prompt and is randomly chosen by a computer or is where the host posts the day's prompt. As the site has increased in popularity (for example, there are now 254 emails sent out every day and been over 3,500 unique visitors in the last 30 days), I feel an increasing need to explain how the site operates and seeks to compliment the #vss365 community.
This post is written in a FAQ-type format to make it easier to understand the information. It may be updated as information changes or new questions arise. There are a few technical references and explanations but I have kept them as simple as possible for non-technical readers. 🙂
Where do the words come from?
This feels like something I should not have to write but am going to anyway for clarity's sake.
All #vss365 activities occur on Twitter. The way the writing game works is simple.
- A specific person is designated as the Host or Hostess for a specific month. The master list of Host/Hostesses is currently maintained by Arthur Unk. Future Hosts/Hostesses are announced in advance so people can follow them.
- Every day during their designated month, the Host/Hostesses posts a tweet containing the word (sometimes referred to the writing prompt). Only the Host/Hostess knows the word list ahead of time!
- People then write very short stories (hence vss) in any style and genre they wish using the word. Writers typically also use the word for thematic inspiration but that is not required. Any and everyone who wants to join can join! It's not a competition, just a fun way to write something in a very brief format.
How does #vss365 today fit into the #vss365 community?
As I wrote in January 2019, I created the site as a way to quickly get the prompt into the hands and minds of more writers more easily. However, it was a selfish reason to start. A writer friend (and amazing college instructor), Dr. B, introduced me and my friends to #vss365 a few weeks before and I really enjoyed it. However, due to the nature of Twitter, it was really easy for me to miss seeing the prompt. Being a programmer, I knew there had to be an easier way to distribute the prompt. As I started creating the site, I realized it would be beneficial to the greater community. So, I put it on the public internet and the rest is history.
#vss365 today fits into the #vss365 community by serving its intended purpose: making the prompt more easily available for writers to find. It further provides a 100% complete archive for every word ever used since the inception of the game with a way to quickly search them. Hopefully soon, it will also better serve the Hosts/Hostesses when building their list of words by providing a downloadable version of the word archive.
In short, it fits into the community by complementing the community. It cannot and will not replace how the game and community operates. Instead, it seeks to assist it in continuing to provide a fun, welcoming, and diverse environment for writers of all skill levels and backgrounds.
How does #vss365 today find the Host/Hostess?
There is a list containing Hosts/Hostesses and their assigned date(s). This list is different from the master list in that it only contains the announced Hosts/Hostesses. Every day, this list is referenced by an automatic process to determine who is giving out the words. Whenever a new Host/Hostess is announced, I add them to this list.
How does #vss365 today find the day's word?
At a set time every day, there is an automatic process that finds the prompt tweet (I call it the "finder"). It starts by making sure a word has not already been recorded for that day.* If it hasn't, the Twitter API is loaded to read the assigned Host/Hostess' tweets to try to find the prompt tweet. If it's found, all relevant information is extracted and recorded in the word archive.
* OK, so this is a general assumption. The game is based on the premise that there is one and only one word per day. However, this has not always been the case. As of this writing, since the beginning of the game, there have been six days where more than one word was given out. Obviously, these additional words should be recorded.
While the finder process runs automatically, it can also be run manually. When run manually, it asks if this is an additional prompt for the given date. If answered affirmatively, the date check is skipped and the prompt is recorded as if one does not already exist for the day. This adds a small amount of work in other places to ensure one or all prompts are correctly referenced but is insignificant to the bigger picture.
Why is the word sometimes recorded incorrectly?
This is where we get into some technical detail but I'll keep it as simple as possible.
Computers have a hard time understanding language. Even with advances in Machine learning and Artificial intelligence, computers struggle with the meaning of the spoken and written word. For example, take the slang phrase "bombed it." At any given time, this phrase can refer to:
- a literal bomb
- doing something very well
- doing something very poorly
How does a computer know which meaning is intended? As people, we intuitively use context or ask for clarification. Computers cannot do this. There is too much nuance for an accurate assessment. Even sentiment analysis is only a suggestion because computers lack all ability to think in an intuitive manner.
On the other hand, the grammar and syntax of the written word tends to be more defined. While there is still variety, inconsistency, and evolution of written text, merely extracting words and phrases from the written word without interpretation is easier for computers to do. In the case of Twitter, the #vss365 ambassadors had the insight to require the prompt word to be a hashtag. The game further evolved to include
#prompt to distinguish the prompt tweet from the people using the word in their stories. These things make it a lot easier to pull out the word.
The finder process identifies the prompt tweet is by searching for the combination of the hashtags
#prompt. If both of them are found, it is assumed to be the day's prompt tweet. (There is actually a tiny bit more to this involving time zones but is irrelevant to the concept and is a technical detail.)
The tweet contents are searched for hashtags and compared to a blocklist of hashtags. This blocklist contains hashtags that should not be considered as a possible word, such as
#vss365a (which was the specific #vss365 anthology hashtag). It also contains all words that have already been used in that month plus slight variations (like door and doors). This is done because some Hosts/Hostesses have used the previous day's word as a hashtag in the next day's prompt tweet and should not be considered.
Once all hashtags are filtered out, typically the only hashtag left is the prompt word. At that point, we declare that hashtag to be the prompt word and move on with recording all required information.
When the prompt word is incorrectly recorded, it is usually because the filtering step failed to exclude some hashtag(s). This list is manually curated and must be tailored to the current Host/Hostess. This is easily corrected for future days by adjusting the blocklist. Correcting the recording currently requires a manual intervention into the site database but will eventually be possible for me or a trusted few to correct through a (yet to be developed) admin panel.
Why is the word repeatedly recorded incorrectly?
Another issue computers face when working with written language is its lack of structure.
In computing, there are three major types of data: structured, semi-structured, and free-form. Structured data has a strict definition. Data must be written a certain way and is easily checked to see if it's correct or not. Semi-structured data has a defined structure but is not as strict. It has variety in how it is written. Free-form data has no defined form and be written however which way is desired.
Computers work best with structured data and progressively get worse with semi-structured and free-form data. Being able to read and make sense of semi-structured and free-form data is a hot topic in computing right now because there is a lot of it. If a company has the ability to successfully convert these data types into understandable structured data, that's a competitive advantage.
The written word falls somewhere between semi-structured and free-form data. While English (for example) has a set of rules that dictate how sentences should be written, they can be broken at will. These rules also evolve over time. It's been repeatedly shown that the Millennial and younger generations tend to send text messages that end without ending punctuation marks. If a period is included, the message has a greater chance of being interpreted as extra serious or insincere, while question marks are sometimes reserved for statements turned into rhetorical questions rather than actual questions (real questions, depending on where there are in the text message, may also lack end punctuation).
It's almost like all spelling and grammar rules are arbitrary and made up. Almost.
The prompt word has a increased likelihood to be incorrectly recorded with every Host/Hostess change. Because each one has their own writing and delivery style and the finder's blocklist must be adjusted to them, it may take at least a week to tune the finder.
Unfortunately, because of the lack of intuitive thinking and the semi-structured data format that is a tweet, always picking up the correct hashtag becomes more complicated, especially when the Host/Hostess includes hashtags that are not included in the blocklist that changes where in the tweet the prompt word appears.
Ideally, after filtering is complete, only one hashtag remains. Alternatively, there can be multiple hashtags remaining as long as the prompt word is always in the same position. If the word, for example, is always the second hashtag out of three remaining, everything is peachy. It is easily predictable where the word will be located.
The trouble occurs when other, non-filtered hashtags move where the prompt word is located. Because computers cannot think intuitively and struggle with understanding non-strictly structured data, the finder always picks the hashtag in the position where it expects the word to be located. If that assumption is broken, the other, incorrect hashtag is selected and recorded as the word.
Why not make the blocklist more restrictive to remove more hashtags?
I could! That is always a possibility. The problem here is not if it's possible, but if it's logical. I don't know ahead of time what hashtags the Host/Hostess will use nor how the prompt tweets will be written and structured. I can only respond to what has already been posted. Additionally, the stricter the blocklist becomes, the greater the possibility that the prompt word itself will be removed, which can cause complete failure (see the next question). A stricter blocklist further restricts the Host/Hostess in how they can write the prompt tweet. Remember, the site's purpose is to compliment the game, not dictate it. Finally, every Host/Hostess usually find their groove and preferred prompt tweet delivery style within a week.
For these reasons, I tend to keep the blocklist as small as possible to filter only as much as needed until a stable pattern appears and the prompt word hashtag is always in a predictable location.
Why is the word sometimes recorded hours after the prompt tweet or not at all?
The entire finder process can fail because the prompt tweet could not be identified. This is typically due to the Host/Hostess not using the
#prompt hashtag as expected or the blocklist filtering too much. This is easily rectified by adjusting the blocklist and/or me running the finder service manually.
Failure can also occur because the finder did not run at the correct time. Currently, I have to ask each Host/Hostess what time (with timezone) they intend to post the word. I then manually adjust the finder's kick-off times and restart it. To account for the possibility of a late posting, I set it to run four (4) times in the hour following. The first runs 5 minutes after the intended time with the remaining running every 20 minutes. So, for example, if the tweet is to be sent at 12:00 midnight, the finder runs at 12:05, 12:25, 12:45, and 1:05. If the prompt tweet is not found within this time I run the finder manually as soon as I have the opportunity. I intend to eliminate this issue in the future by using Twitter's realtime filter API to pick up the prompt tweet immediately.
Finally, failure can also occur because of an error in the code or server downtime. Thankfully, these are rare occurrences and one of the above events are typically the source of failure.
How are the notification emails sent out?
Once the prompt tweet is found and all information is recorded, an email containing some of that information is created and sent out using the Mailgun service to the people who have voluntarily subscribed to the daily notification emails. In the case that this is an additional prompt for the same day, it makes sure to send out the latest received prompt.
If for some reasons the emails did not send out automatically, like the finder, they too can be triggered manually.
If a word was recorded incorrectly and has to be corrected, the correction is only reflected on the website. It is not possible to change the notification emails once they are sent.
Can I access the word archive myself?
Absolutely! The entire word archive is available to download as an Excel spreadsheet! With it, you can slice, dice, and perform all the word analysis you desire!
How is the word archive file created?
The archive file is generated once a day, 5 minutes after the last scheduled finder run time. If the finder runs for the last time at 1:05, then at 1:10, the archive generation starts. It simply goes through the entire database and writes a spreadsheet containing all the data. It's really simple, actually.
Can I have more fine-grained data access?
Maybe! As part of reworking the underlying code to make the site easier to write and add new features, I have been building an API that provides all the information and features needed. The site has actually been using the API for everything since December 2019. I am working on the possibility of opening it up for public use, but that is still a good way off. I'll be sure to write up a post if/when it happens.
In the mean time, try using the search function! It has lots of options to help you find what you want. For example, the word search supports partial word matches, so searching "ai" will finds all words with "ai" in them. You can also download the complete word archive for more in-depth searching.
I'm a programmer myself! Is it possible to assist in development?
Yes! I am a big believer in open-source development and make all source code to the website and supporting processes available on GitHub. All relevant repositories begin with
vss365today. Full development plans are not available online (they are in my head lol) but I am more than willing to start writing those down and making them available.
As a side note, as we increasingly become aware of the privacy-invading practices of social media and other corporations, having the source code publicly available creates accountability for myself and my claims of respecting your privacy because it can inspected at any time to see what is going on. 😉
A thank you
As #vss365 has grown in popularity and involvement, so has the site. It is shared on Twitter almost daily and has been repeatedly referenced by prominent members of the community. I've had multiple Hosts/Hostesses contact me about the site to ensure it stays working and/or to thank me for my service. I even had people email me one time when visiting the site accidentally displayed my personal website. While I appreciate all of the feedback and am grateful for the positive reception, truly the greatest praise is seeing everyone use it. I have built it all up over the last 1.5 years in my free (and not so free) time in the middle of university, working, other programming work, and life in general and will continue to be doing so as I enter graduate school next month. I operate it at a cost expecting nothing in return (although it is possible to contribute!) and hope to not send out a request for assistance any time soon.
At the end of the day, I thank you, the #vss365 community, for turning this little hobby project into a valuable part of your world. It brings me joy to see you embrace it and that brings me great pleasure to continue to adapt it and make it better serve you and your writing endeavors. Thank you so much. 😊