Word Cloud

Post Reply
User avatar
rbytes
Posts: 1963
Joined: Sun May 31, 2015 12:11 am
My devices: iPhone X
iPad 4
MacBook
Dell Inspiron laptop
CHUWI Plus 10 convertible Windows/Android tablet
Location: Calgary, Canada
Flag: Canada
Contact:

Word Cloud

Post by rbytes » Fri Apr 12, 2019 4:41 am

;)
Attachments
86F60443-4AA5-47CA-9D0A-B28800F97BD2.png
86F60443-4AA5-47CA-9D0A-B28800F97BD2.png (615.3 KiB) Viewed 312 times
Last edited by rbytes on Thu Jun 13, 2019 5:06 am, edited 1 time in total.
Zzzzz

matt7
Posts: 105
Joined: Sun Jul 12, 2015 5:00 pm
My devices: iPhone 8, Windows
Location: Kentucky, USA

Re: Word Cloud

Post by matt7 » Fri Apr 12, 2019 6:45 am

Looks great! I like the color palette. It would be cool to have an algorithm before doing the render that takes a text file input (or just a string) and scans it for the word list (ignoring short common words like articles and prepositions) and also keep track of word frequency. Then the rendering code could be adjusted to size the words proportional to their frequency.

The first part would be the most challenging though. I'm not sure how else to do it other than to manually specify a long list of words to exclude (like a, an, the, of, . . . ), and then you still have the problem of whether you try and deal with different forms of the same root (singular form vs. plural form of a noun, different tenses of a verb, etc.).

User avatar
rbytes
Posts: 1963
Joined: Sun May 31, 2015 12:11 am
My devices: iPhone X
iPad 4
MacBook
Dell Inspiron laptop
CHUWI Plus 10 convertible Windows/Android tablet
Location: Calgary, Canada
Flag: Canada
Contact:

Re: Word Cloud

Post by rbytes » Fri Apr 12, 2019 1:43 pm

Thanks. For this first version, I just put the most important (or at least more general) words closest to the start of the list. I have some newer versions I will post in the next few months, which do base the word size on frequency. The other idea, of inputting any text and excluding the articles, prepositions, etc. is a good one. I will see about implementing it.

There are a few word cloud apps available on the store. Some of them can shape the outside edge of the cloud, and I have duplicated that feature. Others can choose a rotational pattern for words, so I also added that to my latest edition.

I'm glad you like the color scheme. I find it effective for the "nostalgia" clouds, but prefer more saturated colors for the motivational or celebration themes. So I will eventually add some color theme options.
Zzzzz

Henko
Posts: 882
Joined: Tue Apr 09, 2013 12:23 pm
My devices: iPhone,iPad
Windows
Location: Groningen, Netherlands
Flag: Netherlands

Re: Word Cloud

Post by Henko » Fri Apr 12, 2019 2:47 pm

Hi Richard,
What, if i have hundreds of ideas:
380F02D7-A1B4-47D7-A811-6BB05B947B5B.png
380F02D7-A1B4-47D7-A811-6BB05B947B5B.png (1.86 MiB) Viewed 296 times
😂

Some suggestion (you said it's only a start):
For use as a brainstorm tool:
- the sheet must be able to be build one word at the time
- an accompanying note should be attached (in the background) to each word
- each word should be selectable, be it directly on the screen, or via a list
- when selected, the accompanying note must pop up for reading/updating
- a report on the printer with all words and (indented) notes
- maybe some kind of hierarchy relations between the words (main items - sub items)

And in general about the presentation: if the words are drawn using a sprite, each word may be randomly rotated a few degrees, which produces a more lively overall picture.

User avatar
rbytes
Posts: 1963
Joined: Sun May 31, 2015 12:11 am
My devices: iPhone X
iPad 4
MacBook
Dell Inspiron laptop
CHUWI Plus 10 convertible Windows/Android tablet
Location: Calgary, Canada
Flag: Canada
Contact:

Re: Word Cloud

Post by rbytes » Fri Apr 12, 2019 3:28 pm

Great suggestions, Henk! I'll see what I can do. Feel free to post your own version(s), and lets see where we can take this.

I like your sample image. I should have mentioned that each run will produce non-overlapping words, but if you run multiple cycles without resetting, you can get this overprinted effect. It is chaotic and yet full of energy, so might be nice on something like a poster or greeting card.

Also you may occasionally find that the program "hangs". This is not a coding problem per se, but a "checkmate" of the non-overprint algorithm. The program is stymied because it cannot randomly find a free space large enough to print the next word. It may seem to us that there is space, but the program uses the text size command to find open spaces. In my later versions, I trap for this condition and exit the loop.
Zzzzz

User avatar
rbytes
Posts: 1963
Joined: Sun May 31, 2015 12:11 am
My devices: iPhone X
iPad 4
MacBook
Dell Inspiron laptop
CHUWI Plus 10 convertible Windows/Android tablet
Location: Calgary, Canada
Flag: Canada
Contact:

Parser

Post by rbytes » Sun Apr 14, 2019 3:51 am

Here is a program named Parser that scans a text file and removes words that are deemed unsuitable for a word cloud. At the end, array G$ is filled with the suitable words. More work is needed, because some words need to stay in groups to retain their combined meaning. Some hyphenated examples are shown. Character 160 could also be inserted between words to keep them in groups.

Code: Select all

/*
Parser by rbytes
April 2019
removes non-essential words from a text file, leaving key descriptive words
*/

a$="4K STUNNING VIDEO & PICTURES with breathtaking 4K Ultra High Definition and ultra-clear 13-megapixel stills. EASY-TO-USE no flying experience necessary to take dramatic aerial group photos and video, simply pick from 5 automated flight modes to get the shot of your choice: Selfie, Pilot, Orbit, Journey, Follow Me. SAFE TO FLY with built-in Indoor Positioning System to allow Breeze to hold its position indoors and outdoors; the propeller protectors preventing them from coming into contact with other objects. Also, automatically returns to home and lands with the tap of a button. SOCIAL Download the app on your mobile device to control and navigate Breeze, edit photos & videos, then instantly share to your favorite social media sites. PORTABLE & LIGHTWEIGHT Compact and on-the-go, foldable propellers, this powerful flying camera drone weighs less than 1 pound and easy to carry."
PRINT a$
PRINT


' remove commas

flag=1
WHILE INSTR(a$,",",flag)>-1
  temp=INSTR(a$,",",flag)
  a1$=MID$(a$,0,temp)
  a2$=MID$(a$,temp+1,2000)
  a$=a1$&a2$
  flag=temp+1
END WHILE

a$=LOWSTR$(a$)
PRINT a$


' create a string array containing the desired text

SPLIT A$ TO M$,N WITH " "


' eliminate capitalization of the first word (will need to be done at start of each sentence)

M$(0)= LOWSTR$ (M$(0))


' create a string array of words to remove

P$="to|from|with|a|the|in|into|also|other|get|take|keep|each|us|allow|permit|enable|let|me|no|any|less|more|than|lesser|because|greater|them|also|&|this|that|of|your|their|his|it|its|hers|his|theirs|because|besides|then|now|an|and|that|it|they|he|she|is|must|shall|unless|until|was|on|under|above|beside|out|in|up|down|0|1|2|3|4|5|6|7|8|9|10"

SPLIT P$ TO K$,L WITH "|"


' erase the less-desirable words from the M$ text array

FOR t=0 TO L-1
  FOR u = 0 TO N-1
    IF M$(u)=K$(t) OR RIGHT$(M$(u),2)="ly" THEN M$(u)=""         ' eliminate adverbs
    IF RIGHT$(M$(u),1)="." OR RIGHT$(M$(u),1)=";" OR RIGHT$(M$(u),1)=":" THEN M$(u)=LEFT$(M$(u),LEN(M$(u))-1)  ' remove periods, semicolons and colons
  NEXT u
NEXT t
PRINT


' eliminate duplicates

FOR t=0 TO N-1
  FOR u = 0 TO N-1
    IF M$(u)=M$(t) AND t<>u THEN M$(u)=""
  NEXT u
NEXT t


' print the desired words

cnt=0
FOR u=0 TO N-1
  IF M$(u)<>"" THEN
    PRINT M$(u)
    cnt+=1
  ENDIF
NEXT u


' store the key words in a new array

DIM G$(cnt)
cnt=0
FOR u=0 TO N-1
  IF M$(u)<>"" THEN
    G$(cnt)=M$(u)
    cnt+=1
  ENDIF
NEXT u


PAUSE 5
Attachments
99E1570C-C806-4498-8315-1DA4EEB7E96A.png
99E1570C-C806-4498-8315-1DA4EEB7E96A.png (339.44 KiB) Viewed 272 times
D74EBF31-A43D-4840-ABD5-5B3E262781C1.png
D74EBF31-A43D-4840-ABD5-5B3E262781C1.png (137.61 KiB) Viewed 272 times
Zzzzz

Henko
Posts: 882
Joined: Tue Apr 09, 2013 12:23 pm
My devices: iPhone,iPad
Windows
Location: Groningen, Netherlands
Flag: Netherlands

Re: Word Cloud

Post by Henko » Sun Apr 14, 2019 6:17 pm

I made some adaptations in the code to make it faster. With the present test data both programs take very short time, but with larger files the speed difference (a factor 6) may be of value.
The main thing was to sort the arrays, which permits more efficient processing.

Code: Select all

' read text file and rejectables file and sort the 2 tables

file "parser_reject.txt" input p$   ' input array with words to remove
SPLIT P$ TO K$,L WITH "|" ! sort k$
sep$=" ,.:;&"                       ' input text to be parsed
file "parser_text.txt" input a$ ! a$=lowstr$(a$)
SPLIT a$ TO m$,n WITH sep$ ! sort m$ ! print a$ ! print

' eliminate duplicates and adverbs
k=0
for i=1 to N-1 ! z$=m$(i)
  if z$=m$(k) or RIGHT$(z$,2)="ly" then continue
  k+=1 ! m$(k)=z$
  next i
N=k+1

' eliminate rejectables
k=-1
for i=0 to N-1 ! z$=m$(i)
  reject=0
  for j=0 to L-1
    if z$<K$(j) then break
    if z$=K$(j) then ! reject=1 ! break ! end if
    next j
  if reject then continue
  k+=1 ! m$(k)=z$
  next i
N=k+1

FOR u=0 TO N-1 ! PRINT M$(u) ! NEXT u    ' print the desired words

PAUSE 5

User avatar
rbytes
Posts: 1963
Joined: Sun May 31, 2015 12:11 am
My devices: iPhone X
iPad 4
MacBook
Dell Inspiron laptop
CHUWI Plus 10 convertible Windows/Android tablet
Location: Calgary, Canada
Flag: Canada
Contact:

Re: Word Cloud

Post by rbytes » Wed Apr 17, 2019 1:49 pm

Nice improvements. I will try to adapt it for Word Cloud.

I will sort the rejected word array, but won't sort the desired words, because they need to stay in priority order to display from large to small size.
Zzzzz

Post Reply