Page 1 of 1

Word Cloud

Posted: Fri Apr 12, 2019 4:41 am
by rbytes

Re: Word Cloud

Posted: Fri Apr 12, 2019 6:45 am
by matt7
Looks great! I like the color palette. It would be cool to have an algorithm before doing the render that takes a text file input (or just a string) and scans it for the word list (ignoring short common words like articles and prepositions) and also keep track of word frequency. Then the rendering code could be adjusted to size the words proportional to their frequency.

The first part would be the most challenging though. I'm not sure how else to do it other than to manually specify a long list of words to exclude (like a, an, the, of, . . . ), and then you still have the problem of whether you try and deal with different forms of the same root (singular form vs. plural form of a noun, different tenses of a verb, etc.).

Re: Word Cloud

Posted: Fri Apr 12, 2019 1:43 pm
by rbytes
Thanks. For this first version, I just put the most important (or at least more general) words closest to the start of the list. I have some newer versions I will post in the next few months, which do base the word size on frequency. The other idea, of inputting any text and excluding the articles, prepositions, etc. is a good one. I will see about implementing it.

There are a few word cloud apps available on the store. Some of them can shape the outside edge of the cloud, and I have duplicated that feature. Others can choose a rotational pattern for words, so I also added that to my latest edition.

I'm glad you like the color scheme. I find it effective for the "nostalgia" clouds, but prefer more saturated colors for the motivational or celebration themes. So I will eventually add some color theme options.

Re: Word Cloud

Posted: Fri Apr 12, 2019 2:47 pm
by Henko
Hi Richard,
What, if i have hundreds of ideas:
380F02D7-A1B4-47D7-A811-6BB05B947B5B.png (1.86 MiB) Viewed 724 times

Some suggestion (you said it's only a start):
For use as a brainstorm tool:
- the sheet must be able to be build one word at the time
- an accompanying note should be attached (in the background) to each word
- each word should be selectable, be it directly on the screen, or via a list
- when selected, the accompanying note must pop up for reading/updating
- a report on the printer with all words and (indented) notes
- maybe some kind of hierarchy relations between the words (main items - sub items)

And in general about the presentation: if the words are drawn using a sprite, each word may be randomly rotated a few degrees, which produces a more lively overall picture.

Re: Word Cloud

Posted: Fri Apr 12, 2019 3:28 pm
by rbytes
Great suggestions, Henk! I'll see what I can do. Feel free to post your own version(s), and lets see where we can take this.

I like your sample image. I should have mentioned that each run will produce non-overlapping words, but if you run multiple cycles without resetting, you can get this overprinted effect. It is chaotic and yet full of energy, so might be nice on something like a poster or greeting card.

Also you may occasionally find that the program "hangs". This is not a coding problem per se, but a "checkmate" of the non-overprint algorithm. The program is stymied because it cannot randomly find a free space large enough to print the next word. It may seem to us that there is space, but the program uses the text size command to find open spaces. In my later versions, I trap for this condition and exit the loop.


Posted: Sun Apr 14, 2019 3:51 am
by rbytes
Here is a program named Parser that scans a text file and removes words that are deemed unsuitable for a word cloud. At the end, array G$ is filled with the suitable words. More work is needed, because some words need to stay in groups to retain their combined meaning. Some hyphenated examples are shown. Character 160 could also be inserted between words to keep them in groups.

Code: Select all

Parser by rbytes

Re: Word Cloud

Posted: Sun Apr 14, 2019 6:17 pm
by Henko
I made some adaptations in the code to make it faster. With the present test data both programs take very short time, but with larger files the speed difference (a factor 6) may be of value.
The main thing was to sort the arrays, which permits more efficient processing.

Code: Select all

' read text file and rejectables file and sort the 2 tables

file "parser_reject.txt" input p$   ' input array with words to remove
SPLIT P$ TO K$,L WITH "|" ! sort k$
sep$=" ,.:;&"                       ' input text to be parsed
file "parser_text.txt" input a$ ! a$=lowstr$(a$)
SPLIT a$ TO m$,n WITH sep$ ! sort m$ ! print a$ ! print

' eliminate duplicates and adverbs
for i=1 to N-1 ! z$=m$(i)
  if z$=m$(k) or RIGHT$(z$,2)="ly" then continue
  k+=1 ! m$(k)=z$
  next i

' eliminate rejectables
for i=0 to N-1 ! z$=m$(i)
  for j=0 to L-1
    if z$<K$(j) then break
    if z$=K$(j) then ! reject=1 ! break ! end if
    next j
  if reject then continue
  k+=1 ! m$(k)=z$
  next i

FOR u=0 TO N-1 ! PRINT M$(u) ! NEXT u    ' print the desired words


Re: Word Cloud

Posted: Wed Apr 17, 2019 1:49 pm
by rbytes
Nice improvements. I will try to adapt it for Word Cloud.

I will sort the rejected word array, but won't sort the desired words, because they need to stay in priority order to display from large to small size.