There is clearly a lot of confusion here.
A media server could be just software or it could come with dedicated hardware. Much like you can have Titan One/Titan Mobile or Tiger Touch/Pearl Expert/Sapphire Touch.
Just software is cheaper but may be limited by the capability of the PC/laptop. Hardware is more expensive but is designed specifically for the job. In the case of media servers this can mean the ability to run multiple layers (for example fading across video clips) over multiple full HD outputs at full frame rate and with a ton of image processing happening in real time. For the high-end applications you could build a fast PC with multiple video cards that did this but it's probably easier to buy a purpose built machine that can be rack-mounted and cope with the demands of touring. For low-end applications an average PC or laptop will probably be more than adequate.
To quote from the
Media Server Wiki:
The growing use of motion graphics in environments such as Theatre, Dance, Corporate Events and rock tours has led to the development of media servers designed specifically for live events. These machines are often high-spec home computers with increased RAM or hard drive technologies such as RAID arrays or solid-state drives. They are then supplied with software which allows the control and manipulation of video content, much like VJ software. One of the primary functions of these machines is to allow current show control technologies to control the playback of video content. Thus, a media server system may include inputs for DMX512-A, MIDI or similar control protocols.
By 'current show control technologies' it essentially means lighting desks which, in turn, are designed to control things using 8-bit channels (ie. 0-255 per channel). While it can certainly be controlled using DMX512 or MIDI, in reality they tend to only be controllable by Art-Net or similar network protocols like sACN. I would imagine the main reason for this is that the potential number of control channels required makes physical DMX impractical.
Titan (or any other lighting desk software) will communicate with a media server using Art-Net, either on the same machine or over a network. There's no difference.
In the case of Screen Monkey the two most significant channels relating to playback are 'Clip Select' and 'Playback'. Clip Select is the one that will show thumbnails in the attribute editor when patched as an active fixture. Clicking on a thumbnail will select the clip. 'Playback' has three functions - 'Clear Layer', 'Play Clip' and 'Pause'. If Playback is set to 'Clear Layer' then nothing will happen when you select a clip. When you change this to 'Play Clip' the selected clip will start. If you select a clip while play clip is active it will start immediately. The active fixture thumbnails are just representing values on the Clip Select channel. If you didn't have it patched as an active fixture you could still select the clips by number - you just won't have an image that shows you what it is. Most media servers work in much the same way.